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UNIT 9 Matrices and determinants 


Study guide for Unit 9 


This is the first of four units devoted to the study of mathematical meth- 
ods involving matrices. Much of the material in this unit and the next 
will provide you with the necessary background for solving systems of lin- 
ear differential equations in Unit 11 and systems of non-linear differential 
equations in Unit 138. 


You may have met matrices before; in this unit we review most of their 
basic properties and show you how they can be applied to solving systems 
of linear equations. 


Sections 1 and 2 are independent and could be interchanged if you wish. 
You may find it helpful to read Section 2 before Section 1 if you have not 
met matrices before. 


Section 5 is a computer session where you will use the computer algebra 
package for the course. This section contains computer activities that are 
associated with Sections 1, 2, 3 and 4. 


Unit 9 Matrices and determinants 


Introduction 


In this unit we shall examine some of the properties and applications of 
matrices, where an m X n matrix is a rectangular array of elements with 
m rows and n columns. For example, i : | isa 2 x 3 matrix. 

Among their many applications, matrices are useful in describing electrical 
circuits, in the analysis of forces and in writing down the equations of mo- 
tion for a system of particles; and they form an essential component of an 
engineer’s or scientist’s toolkit. Matrices have a role to play whenever we 
need to manipulate arrays of numbers; in applications, m and n may be very 
large, so do not be misled by the fact that the discussion concentrates on 
small matrices. 


In applied mathematics, one common problem is to solve a system of equa- 
tions involving unknown constants, i.e. to determine values of the constants 
that satisfy the equations. Matrices can be used to store details of such a 
problem. For example, a system of equations such as 


oe 8, 


dx — 5y = —6, (0.1) 


contains three relevant pieces of information: 


e the numbers on the left-hand sides of the equations, which can be stored 
in the 2 x 2 matrix 


a a] 


e the constants to be determined, which can be stored in the 2 x 1 matrix 


HI 


e the numbers on the right-hand sides of the equations, which can be 
stored in the 2 x 1 matrix 


[-s] 


With this notation, the essential information in these equations can be writ- 
ten in the form 


F z Hl 7 ea (0.2) 


If we put 


ae [t 3) xf} »= [3 


then Equation (0.2) can be written as 
Ax =b. (0.3) 


For the moment you may regard Equation (0.2) as merely a convenient 
shorthand for the original system of equations, but later we shall see that 
it is compatible with matrix multiplication. Generally the matrix A will be 
an nm X n matrix, while x and b are vectors containing n elements, where n 
may be large. In this unit we shall be concerned with the solutions of such 
systems, which are known as systems of linear equations, and the problem 
of finding a solution can be expressed as one of finding the vector x that 
satisfies Equation (0.3). 


In this course the elements of 
matrices are real numbers. 
Other words for element are 
component and entry. 


The use of the curly bracket 
here emphasizes that we are 
dealing with a system of 
equations. 


A matrix with one column is 
often referred to as a vector, 
or sometimes as a column 
vector. You met vectors in 


Unit 4. 


The term linear comes from 
the fact that each of the 
equations can be represented 
graphically by a straight line. 


Section 1 Simultaneous linear equations 


There is a graphical interpretation of the system of equations (0.1). Each 
equation represents a straight line, as you can see by rearranging the equa- 
tions as y = 3x + - and y = ax + g. The solution of this system of equa- 
tions thus lies at the point of intersection of the graphs of the straight lines, 
as illustrated in Figure 0.1. For this pair of equations, there is just one 
solution, ic. c= 1, y= 2. 


Figure 0.1 


For two equations in two unknowns it is easy to draw graphs or to manipulate 
the equations to determine the solution. It would be much more difficult 
if we were faced with a problem involving 100 equations in 100 unknowns. 
Then we would need a systematic method of working with matrices to obtain 
the solution, because this would enable us to program a computer to solve 
systems of linear equations. 


You may already have met a matrix method of solving a system of two Matrices and matrix 
linear equations in two unknowns using the inverse of a matrix. Although operations are revised in 
this method works well for 2 x 2 matrices, it is not very efficient for large Section 2. 

systems of equations compared with other methods. 


In Section 1 you will be introduced to a matrix method of solving large 

systems of linear equations, called the Gaussian elimination method, and 

you will see the conditions required for the method to work. In Section 2 

we review some of the properties of matrices that make them so useful, 

and introduce the determinant of a matrix (a concept that is important for 

the discussion of eigenvalues in the next unit). In Section 3 we introduce 

two applications. In the first we investigate the problem of determining a 

polynomial of degree n that passes through n+ 1 data points. The second 

application is the least squares method, which allows us to find the ‘best’ What is meant by ‘best’ will 
polynomial of degree n when there are more than n + 1 data points. Section 4 be explained in Section 3. 
shows that in certain situations numerical errors can accumulate and render 

a solution unreliable. Section 5 is a computer session where you can explore 

the ideas and methods of Sections 1, 2, 3 and 4. 


1 Simultaneous linear equations 


The purpose of this section is to outline an efficient systematic method of 
obtaining the solution of a system of linear equations, known as Gaussian 
elimination. In Subsection 1.1 we introduce the method by manipulating 
equations, then in Subsection 1.2 we do the same calculations using matrices. 
To complete this section we look at the types of solution that can arise when 
solving such systems of linear equations. We illustrate the method by solving 
systems of three equations in three unknowns. 


Unit 9 Matrices and determinants 


1.1 Manipulating equations 


We begin with an example where the solution of a system of three equations 
in three unknowns can be found fairly easily. 


Example 1.1 
Find the solution of the system of equations 
X1— 4%2 + 2x3 = —9, Ey 
10x%2 — 323 _ 34, Ey 
243 = A. E33 
Solution 


We can find the values of 71, x2 and x3 from Ej, E2 and E3. Starting 
with E3, we obtain 73 = 2. Substituting this value into EF, we obtain 
1022 — (3 x 2) = 34, so x2 = 4. Substituting the values for x2 and x3 into Fj, 
we obtain 2; — (4 x 4) + (2 x 2) = —9, so z1 = 3. Hence the solution is 
Ly 3, x2 4, x3 2. Hf 


The system of equations in the above example is easy to solve because the 
equations are in upper triangular form, i.e. the first non-zero coefficient 
in F; is the coefficient of x;. For a system of equations of the form 


v1 — 4aq + 2x3 = —9, Fy 
32, — 2%. + 32%3= 7, BE» 
821 — 2%2 + 9x3 = 34, Ez 


the objective of the first stage of the Gaussian elimination process is to 
manipulate the equations so that they are in upper triangular form. The 
second stage of the process is to solve the system of equations in upper 
triangular form using back substitution, where, starting with the last 
equation, we work back to the first equation, substituting the known values, 
in order to determine the next value, as demonstrated in Example 1.1. 


The key property of the equations that we shall use is that we may add 
and subtract multiples of them without affecting the desired solution. For 
example, 11 = 3, T2 = 4, 3 = 2 satisfies both of the equations 


v1 — 4%q + 273 = —9, Ey 
321 — 2%0 + 323 = 7. FE» 


Suppose that we form £2, by subtracting 3 times FE, from E2, so that 
Ea — E» = 3, i.e. 


3x1 — 2xq + 3x3 — 3(x1 — 402 + 223) = 7 — 3(-9), 
giving 


10x2 = 323 = 34. Ea 


Then x1 = 3, r2 = 4, x3 = 2 also satisfies Ea. 


More generally, we could form an equation E2, by writing Lo, = pE; + qs, 
for any numbers p and g. Then £, is said to be a linear combination 
of Ey and Eg, and again 7; = 3, r2 = 4, x3 = 2 satisfies E,. Our strat- 
egy in the elimination stage is to form linear combinations of equations 
to reduce the system to upper triangular form. The Gaussian elimination 
method uses a particular algorithm in which linear combinations of the form 
Eo, = Ep — mE}, are used, as we shall see in the following discussion and 
examples. 


For ease of reference, we label 
the equations EF), Fz and E3. 
The unknowns in our 
equations will always be 
written in the order 

U1, U2, %3.- 


Stage 1: elimination 


Stage 2: back substitution 


An algorithm is a procedure 
or set of rules to be used in a 
calculation. 


Section 1 Simultaneous linear equations 


Example 1.2 


Use the Gaussian elimination method to reduce the following simultaneous 
equations (see above) to upper triangular form. Hence deduce the solution. 


21 — 4%9 + 2x43 = —9 Fy 

321 — 2%2 + 843 = 7 Ey 

8x1 — 2%2 + 9x3 = 34 E3 
Solution 


Stage 1: elimination 


We begin with the elimination stage, where we eliminate x; (in Stage 1(a)) 
and then x (in Stage 1(b)). 


Stage 1(a) ‘To eliminate 2}, first we subtract a multiple of E; from EF to 
obtain a new equation with no term in x1. Subtracting 3 times FE from FE» 
gives 

10x2 = 323 = 34. Ea 


Now we subtract a multiple of E, from E3 to obtain a new equation with 
no term in 21. Subtracting 8 times E; from £3 gives 


30xr2 — 7x3 = 106. EZ 
So, on completion of Stage 1(a), the equations have been reduced to 
%1— 4%9+ 2273 = —9, E, 
102x2 = 323 = 34, Fda 
3022 = 7x3 = 106. Fiza 


Stage 1(b) Next we eliminate x2. We see that E2, and E3, are two equa- 
tions in two unknowns. So we subtract a multiple of F2, from E3, to obtain 
an equation £3, with no term in x9. Subtracting 3 times £2, from E3, gives 


2x3 = 4. E3p 


At this point, the elimination process is finished. We have brought our 
equations into upper triangular form as 


aj— 4x%9 + 2x3 = —9, Ey 
10x22 = 323 = 34, Ea 
2%3= A. E3p 


Stage 2: back substitution 
We now solve this system of equations in the back substitution stage. 


This system was solved in Example 1.1, using back substitution, to give the 
solution x1 = 3, v2 = 4, x3 = 2. Checking that the solution satisfies the 
original system of equations, we have 


LHS of Fy = x1 — 44q + 2x3 = (1 x 3) — (4x 4) + (2 x 2) = —9 = RHS, 
LHS of Ey = 3x1 — 2x2 + 3x3 = (3 x 3) — (2x 4) + (8 x 2) = 7 = RHS, 
LHS of £3 = 8x, — 2x2 + 9a3 = (8 x 3) — (2x 4) + (9 x 2) = 34 = RHS. 


The process of solving simultaneous equations described in Examples 1.1 
and 1.2 is called the Gaussian elimination method. The method provides 
a systematic approach to the problem of solving simultaneous equations that 
should cope with the rather larger sets of equations that can occur in real- 
life applications. In practice, hand calculation will almost certainly involve 
fractions, and computer calculations will involve numeric approximations to 
these fractions and so will introduce rounding errors. We have avoided such 
problems in Example 1.2 and in the next exercise. 


Each step of the Gaussian 
elimination algorithm is 
indicated in the margin 
below. 


Eg, = En — 3E, 


Ez, = E3 — 8 


E3y = Ea — dE, 


We do not give a formal 
procedure for the Gaussian 
elimination method at this 
point, since we give a matrix 
formulation shortly. 


Unit 9 Matrices and determinants 


*Exercise 1.1 


Solve the following simultaneous equations using the Gaussian elimination 
method. 


t+ t2- £3= 2 Ey 
021 + 2x2 + 2x73 = 20 E» 
4x1 = 229 _ 323 =15 E3 


1.2 Manipulating matrices 


The Gaussian elimination method relies on manipulating equations. In this 
subsection we shall see that it can be formulated efficiently in terms of 
matrices. 


When we use the Gaussian elimination method to solve the equations in 
Example 1.2, the new equation £2, is obtained by subtracting a multiple of 
FE, from E2. The actions are performed on the numbers multiplying 21, x2 
and x3 (the coefficients of 21, x2 and x3). For instance, the operation on 
the coefficients of x2 in the equation Fiz, = Ep — 3E\, —2x2 — 3 x (—4x2) = 
1022, could be recorded as —2 — 3 x (—4) = 10, provided that we remember 
that the operation is associated with 72. 


Thus, during the elimination stage, we need to record just the coefficients 
of x1, %2 and x3, and the right-hand side of each equation, rather than the 
whole system of equations each time. We record the coefficients in Ey, E> 
and Es in a coefficient matrix A, the unknown constants in a vector x, 
and the right-hand sides in a right-hand-side vector b as 


1 —-4 2 Ly —9 
A= 1/3 -2 3], x=]ao}], b= 7 
8 -2 9 x3 34 


The problem, in terms of matrices, is to determine the vector x that satisfies 
the equation Ax = b, which can be written as 


1 -4 2 LY —9 
3-2 3 tr) = 7 
8 -2 9 3 34 
SO esr CN 
A x b 


For computing purposes it is sufficient just to record the information in the 
augmented matrix 


1 -4 2]|-9 
Alb=|3 -2 3] 7 
8 =2 0} 34 


The first row of Alb contains the coefficients of x1, x2 and 23 and the right- 
hand side of E,, the second row contains similar information about 2, and 
the third row contains similar information about 3. 

*Exercise 1.2 


Write the following systems of equations in augmented matrix form. 


_ r1 + 22% = 3 

ee 2 by oe See A 
4a, + Trg = 11 

tg —-%#3=-1 
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An alternative description of 
Alb is to say that the first 
three columns of A|b 
represent, respectively, the 
coefficients of 71, 2 and 73, 
and the column after the bar 
represents the right-hand 
sides of the equations. 


Section 1 Simultaneous linear equations 


Once the information has been written in augmented matrix form, the stages 
in the Gaussian elimination process are equivalent to manipulating the rows 
of the matrix, as we demonstrate in the next example. 


Example 1.3 


Solve the simultaneous equations of Example 1.2 using the matrix form of 
the Gaussian elimination method, where 


1 -4 2]-9 
Alb=|3 -2 3| 7 
8 2 9/34 


Solution 


We use the matrix representing these equations throughout the elimina- 
tion procedure. For brevity, we denote the first row of the matrix, namely 
[1 —4 2] -9], by Ri, and so on. 


1 -4 2)-9] Ri 
3. -2 3} 7] Re 
8 —2 9} 34] Rs 


Stage 1: elimination 


Each part of the elimination stage in Example 1.2 has an equivalent part in 
matrix form. 


Stage 1(a) First we eliminate x; as before. Equation F2, in Example 1.2 
was found by subtracting 3E) from E. Arithmetically, this is the same op- 
eration as subtracting 3R, from Rg», and it is useful to record this operation. 
The way in which the new rows are formed is recorded on the left of the 
matrix. 
1 -4 2); -9} R, 
R.-3R, |0 10 —3] 34] Raa 
R3—8R, |0 30 —7] 106} Reg, 


Stage 1(b) Now we eliminate x2, as before. 


1 -4 2}-9}] Ri 
0 10 —3) 34} Roa 
Rzga—-38Ro, |0O O 2) 4] Rep 


A triangle of zeros appears in the matrix at the end of the elimination stage. 
This shows us that each equation has one fewer unknown than the previous 
one, which is what we need in order to do the back substitution. 


The final coefficient matrix is known as an upper triangular matrix, 
since the only non-zero elements it contains are on or above the leading 
diagonal, i.e. the diagonal from top-left to bottom-right, here containing 
the numbers 1, 10 and 2. 


Stage 2: back substitution 


Before carrying out the back substitution stage, we write the final matrix 
as a system of equations: 


@y— 4x9 + 2x3 = —9, Ey 
10zx2 = 323 = 34, E> 
2%3 = =A. E3 


This is exactly the same as in Example 1.1. The solution x; = 3, ro = 4, 
x3 = 2 is then found using back substitution as before. IH 


Notice that R; encapsulates 
all the information about the 
equation EF;. 


Adding or subtracting a 
multiple of one row to/from 
another is called a row 
operation. 


It is essential to keep a record 
of the derivation of each row 
if you wish to check your 
working. 


The leading diagonal is 
sometimes called the main 
diagonal. 
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Unit 9 Matrices and determinants 


The objective of Stage 1 of the Gaussian elimination method is to reduce 
the matrix A to an upper triangular matrix, U say. At the same time, the 
right-hand-side vector b is transformed into a new right-hand-side vector, c 
say, where, in this example, 


i =f 2 ~9 
U=|0 10 -3|, c=| 34 
0 0 2 4 


The Gaussian elimination method in Procedure 1.1 underpins the algorithm 
used by the computer algebra package for the course. 


Procedure 1.1 Gaussian elimination method 


To solve a system of n linear equations in n unknowns, with coefficient 

matrix A and right-hand-side vector b, carry out the following steps. 

(a) Write down the augmented matrix Alb with rows R;,...,Rn. 

(b) Subtract multiples of R; from R2,R3,...,R, to reduce the ele- 
ments below the leading diagonal in the first column to zero. 
In the new matrix obtained, subtract multiples of Rz from Rs, Ru, 
...,R,y, to reduce the elements below the leading diagonal in the 
second column to zero. 
Continue this process until A|b is reduced to U|c, where U is an 
upper triangular matrix. 

(c) Solve the system of equations with coefficient matrix U and right- 
hand-side vector c by back substitution. 


The steps of the elimination stage of Example 1.3 are 

Roa = Re — 3R1, 

Rz3a = Rg — 8R1, 

R3p = R3a — 3Raa. 
The numbers 3, 8 and 8 are called multipliers. In general, to obtain, for 
example, an equation £2, without a term in x; from 


Q11%1 + ay9%Qg+-+-= dh, Ey 
21X21 + ag2%2 +--+: = do, i» 
a3iZ1 + agora +--+: = da, Es 


where a1 # 0, we subtract (a21/ai1) £1 from E2. The number ag1/a11 is 
the multiplier. 


In forming a multiplier, we divide by a number, a1, in the above general- 
ization. The number by which we divide is referred to as a pivot or pivot 
element, and the row in which it lies is the pivot row. Looking again 
at Example 1.3: in Stage 1(a) the multipliers are 3 = 3/1 = ag1/a11 and 
8 = 8/1 = a31/a11, and the pivot is a1; = 1; in Stage 1(b) the multiplier is 
3 = 30/10 = a32/a22 and the pivot is ag2 = 10. In general, the kth pivot 
is the number in the denominator of the multipliers in Stage 1(k) of the 
elimination stage. At the end of the elimination stage, the pivots comprise 
the elements of the leading diagonal of U. 


Example 1.4 


Use the matrix form of the Gaussian elimination method to solve the fol- 
lowing simultaneous equations. 


37, + 2 - x43=1 Ey 
0%, + 2+ 273 =6 1p) 
4x, — 2x9 — 373 = 3 Ex 


12 


This procedure does not 
always work. We examine 
cases where it breaks down in 
the next subsection. 


Stage 1: elimination 
Stage 1(a) 

Stage 1(b) 

Stage 1(c),... 


Stage 2: back substitution 


Section 1 Simultaneous linear equations 


Solution 
The augmented matrix representing these equations is as follows. The row operations used to 
solve this example are 
3 1 -ly1} Ri highlighted in the margin 
5 il 2/6] Ro below. 


A 2G a 


Stage 1(a) We reduce the elements below the leading diagonal in column 1 


to zero. The pivot is aj. 
, Ff <i) 17 Ry 
Ro-§Ri fo -3 BIH] Rog mh us 
R3—$Ri LO -2 -3/] 3] Raa Rga = R3 — Ri 


Q11 


Stage 1(b) We reduce the element below the leading diagonal in column 2 


to zero. The pivot is ago. 
3 1 -l 1 R, 
2 ial 13 
(10/3) 0 3 3 3 | Rea 
R3a— “a73y R2a LO 0 —20|-20] Rap Rap = Raa — Roa 
Stage 2. The equations represented by the new matrix are 

321+ Z2- £3= |, Fy 
502 + ~ %3= RB Fz 
—_ 20x3 = —20. E31, 
From £3p, we have x3 = 1. From E2,, we have 502 + wt = 3 so #9 = —1. 


From £1, we have 3x, —1—1=1, so 2, = 1. Hence the solution is 


a = 1, t2=-l, r= 1. | 


Exercise 1.3 


Use the matrix form of the Gaussian elimination method to solve the fol- 
lowing simultaneous equations. 


3%, -— 42=5 EE 
*Exercise 1.4 


Use the matrix form of the Gaussian elimination method to solve the fol- 
lowing simultaneous equations. 


t+ @- ¢r3= 2 Fy 
021 + 2%2 + 2x73 = 20 E» 
4x4 _ 229 — 323 =15 E3 


1.3 Special cases 


The previous examples and exercises may have led you to believe that Pro- 
cedure 1.1 will always be successful, but this is not the case. The procedure 
will fail if at any stage of the calculation a pivot is zero. We shall see that 
sometimes it is possible to overcome this difficulty, but this is not so in every 
case. In the following example we point out some difficulties, and indicate 
whether they can be overcome. 
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Unit 9 Matrices and determinants 


Example 1.5 


Consider the following four systems of linear equations. Try to solve them 
using the matrix form of the Gaussian elimination method as given in Pro- 
cedure 1.1. In each case the method breaks down. Suggest, if possible, a 
method of overcoming the difficulty, and hence determine the solution if one 


exists. 

10x22 — 3473 = 34 2, + 10%. — 3273 = 8 
(a) 1 — 4xvq + 2x43 = —9 (b) 2, + 10x%2 + 273 = 13 
2%3= 4 a+ 4%9+22%3= 7 
x1 + 4x9 — 343 = 2 1 +429 — 343 = 2 
(c) t+ 2x%9 + 2473 = 5 (d) a, +2%9+2%3= 5 
241 + 2x49 + 9x3 = 7 2%, + 2x2 + 9x3 = 13 

Solution 


(a) 
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Since the first pivot a, is zero, there is no multiple of the first row 
that we can subtract from the second row to eliminate the term in 7}. 
However, interchanging two equations does not change the solution of 
a system of equations. Hence, interchanging the first two equations 
gives a system of equations in upper triangular form, from which we 
can determine the solution 2; = 3, rg = 4, ©3 = 2. 


We begin by writing down the augmented matrix. 


1 10 —3) 8] Ri 
Aljb=]1 10 2/13] Ro 
1 4 2) 7] R3 


Stage 1(a) We reduce the elements below the leading diagonal to zero, 
starting with column 1. 


1 10 -3] 8] Ri 
Ro-Ri |0 O 5] 5] Raa 
R3—R, |0 -6 5/-1] Raa 


Stage 1(b) We now want to reduce the element in column 2 that is 
below the leading diagonal to zero. The difficulty here is that we can- 
not subtract a multiple of Re, from Rg, to eliminate the coefficient 
of x2, since the pivot agg is zero. We can overcome this difficulty by 
interchanging Ro, and R3,. 


1 10 -3} 8] Ri 
0 -6 5)]-1] Rap 
Roe R3, |0 OO 5] 5] Reap 


This is now in upper triangular form, and we can proceed to Stage 2 
to determine the solution using back substitution. We find that the 
solution is 7] = 7% = 73 = 1. 


We begin by writing down the augmented matrix. 


i 4 3/9] Ri 
Alb=|]1 2 2/5] Ro 
22 9/7] Rs 


Stage 1(a) We reduce the elements below the leading diagonal to zero, 
starting with column 1. 


1 4 —3/2]|] Ry, 
Re — R; QO —2 5/3] Roa 
R3—2R,; |0 -6 15]3] Re, 


This may appear to be a 
trivial difficulty, but if we 
hope to devise a procedure 
that could be implemented on 
a computer, the process must 
allow for every contingency. 


The notation Roa -— Ra 
indicates that we have 
interchanged Req with Rz,. 
Such interchanges are also 
called row operations. 


Section 1 Simultaneous linear equations 


Stage 1(b) We reduce the element below the leading diagonal in col- 


umn 2 to zero. 
1 4 —3 2] Ry, 


0 -2 5} 3] Raa 
Rzga—3Reo, |0 O 0} -6] Resp 


Stage 2 Wenow try to solve the system of equations represented by the 
above matrix by back substitution. The coefficient matrix is in upper 
triangular form, but if we write out the system of equations as 
1 +49 — 343 = 2, 
—279+573= 3, 


0x3 = —6, 
we see that the last equation has no solution since no value of x3 can 
give 0x3 = —6. Hence the system of equations has no solution. 


(d) We begin by writing down the augmented matrix. 


1, a. 3) Ry 
Ajb=]1 2 2] 5] Ry 
22 9/13] Rg 


Stage 1(a) We reduce the elements below the leading diagonal to zero, 
starting with column 1. 


i. 4 —3/;2]| Ry, 


Ro — Ri 0 —-2 513] Raa We shall refer back to these 
R3—2R, |0 -6 15/9] Rea steps in Example 1.6. 


Stage 1(b) We reduce the element below the leading diagonal in col- 
umn 2 to zero. 


1 4 -—38/2) Ry 
0 -2 543] Raa 
Rza—-3Roa. |O OO O]0] Resp 


Stage 2. Wenow try to solve the system of equations represented by the 
above matrix by back substitution. The coefficient matrix is in upper 
triangular form, but if we write out the system of equations as Systems of equations of this 
kin ur in Unit 1 
Bi Sia 8g = 2, aa aa a. ee 
— 222 + 523 = 3, of the method of solution 
0x3 = 0, used in this example. 
we see that any value of x3 gives 0x3 = 0. If we let x3 = k, where k 
is an arbitrary number, then, proceeding with the back substitution, 
we have —2x9 + 5k = 8, giving rg = —3 + 3k. The first equation gives 
x1 + (—6+4 10k) — 3k = 2, so x1 = 8—7k. So there is an infinite number 
of solutions of the form x, = 8 — 7k, x9 = 3k — 3, x3 =k. The general 
solution can be written as [8 — 7k 3k = 3 k]” where k is an arbitrary 
number. Hf 


In Example 1.5 parts (a) and (b), we were able to overcome the difficulty of a 
zero pivot by making an essential row interchange. In general, whenever 
one of the pivots is zero, we interchange that row of the augmented matrix 
with the first available row below it that would lead to a non-zero pivot, 
effectively reordering the original system of equations. The difficulties in (a) 
and (b) are thus easily overcome, but those occurring in (c) and (d) are 
more fundamental. In (c) there was a zero pivot that could not be avoided 
by interchanging rows, and we were left with an inconsistent system of 
equations for which there is no solution. The final example (d) illustrates 
the case where a pivot is zero and the system of equations has an infinite 
number of solutions. It is these last two cases in particular that we 
explore in the next subsection. 
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1.4 No solutions and an infinite number of solutions 


We begin by looking at the general system of two linear equations in two 
unknowns, given by 


ax + by =e, Ey 
{ cxo+dy =f. 1p) 
All the points satisfying FE; lie on one straight line, while all the points 
satisfying Fz lie on another. The solution of the system of linear equations 
can be described graphically as the coordinates of the point of intersection 
of these two lines. However, if we draw two lines at random in a plane, there 
are three situations that can arise, as illustrated in Figure 1.1. 
(a) This is the typical case, where the two lines are not parallel and there 
is a unique solution to the system of linear equations, corresponding to 
the point of intersection of the two lines (see Figure 1.1(a)). 


(b) This is the special case where the two lines are parallel (and so do not in- 
tersect) and the corresponding system of linear equations has no solution 
(see Figure 1.1(b)). 


(c) This is the very special case where the two lines coincide, so any point 
on the line satisfies both equations and we have an infinite number of 
solutions (see Figure 1.1(c)). 

MCV CISC J eee 


For each of the following pairs of linear equations, sketch their graphs and 
hence determine the number of solutions. 


r+ y=4 r+ y=4 
(a) ee (b) Ne ae 


r+ y=4 y=A4 
(c) Cae ae (d) oe 


*Exercise 1.6 


For the system of equations 
ax + by =e, 
ca + dy = f, 


what is the condition for the two lines to be parallel? 


A linear equation involving three unknowns 2, y and z of the form 
ax + by+cz=d, 


where a, b, cand d are constants, can be represented graphically as a plane in 
three-dimensional space. For a system of three equations in three unknowns, 
a graphical interpretation gives rise to three types of solution, as illustrated 
in Figure 1.2. 


(a) The three planes intersect at a single point, so there is a unique solution 
at the point of intersection (see Figure 1.2(a)). 


(b) The three planes form a tent shape, having no point of intersection and 
hence no solution (see Figure 1.2(b): the three lines where a pair of 
planes meet are parallel). A variant of this case occurs when two (or 
even all three) of the planes are parallel, and so cannot meet at all. 


(c) The three planes have (at least) a common line of intersection and hence 
an infinite number of solutions (see Figure 1.2(c)). 
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Figure 1.1 


| 


(a) 


a 
&) 


(b) 
\ 
(c) 
Figure 1.2 


Section 1 Simultaneous linear equations 


The next example gives an algebraic interpretation of one of these three 
types of solution. For this example, the following definition is required. 


Definition 


A linear combination of rows R;,Ro,...,R, is a row R such that 
R = Ri 4+ q@Ro+---+drRn, where the q; (t= 1,2,...,n) are num- 
bers. A non-trivial linear combination is one where at least one q; 
is non-zero. 


Example 1.6 


Consider the system of linear equations in Example 1.5(d), where the cor- 
responding augmented matrix is as follows. 


hd 28) 
Alb=|]1 2 2] 5] Ry 
22 9/13] Rz 


Show that there is a non-trivial linear combination of the rows of this matrix 
that is equal to a row of zeros. 


Solution 


We use the results of the elimination process in Example 1.5(d), noting that 
Rsp is equal to a row of zeros. From this, we see that 


Rep = Reza — 3Ro, = (Rs 2R)) 3(R2 R}) = R,; —3Ro+ Rs = 0. 


Hence there is a non-trivial linear combination of the rows that is equal to 
the zero row. MH 


In Example 1.6 we saw that Ra, = R; — 3R2 + Rs, which means that Rg, This is the case where the 

is a linear combination of the rows of A|b. However, in the above example, three planes, corresponding to 
we have something more: such a linear combination produces a row of zeros 
(and the corresponding equations have an infinite number of solutions, as 
we found in Example 1.5(d)). When such a relationship exists between the 
rows of a matrix, we say that the rows are linearly dependent. 


the three equations, meet in a 
line (see Figure 1.2(c)). 


Definition 

The rows R,,R2,...,R, of a matrix are linearly dependent if a 
non-trivial linear combination of these rows is equal to the zero row, 
i.e. 


mRi + @Re+---+qRn = 0, 
where the numbers q1, q2,---;n are not all zero. 


Rows that are not linearly dependent are linearly independent. 


If the rows of the matrix A are linearly independent, then the Gaussian 
elimination method works and produces a unique solution to the system of 
equations Ax = b. However, if the rows of A are linearly dependent, then 
the corresponding system of linear equations may have an infinite number 
of solutions, or no solution. The elimination process will reveal which is the 
case. 
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End-of-section Exercise 


*Exercise 1.7 


In the following systems of linear equations, determine which has a single 
solution, which has an infinite number of solutions, and which has no so- 
lution. Where the equations have a unique solution, find it. Where the 
equations have an infinite number of solutions, find the general solution and 
a non-trivial linear combination of the rows of A|b that gives a row of zeros. 


t%1— 2%2+ 543= 7 
ty+ 322 = 4x3 = 20 
x1 + 18x29 — 3lx3 = 40 


%1—2%2+ 5x3 = 6 
%1+3x%9- 4¢43= 7 
224 + 6x2 = 1223 = f2 


v,—- 4%2+ aw3= 14 


(c) OL = L9—- £3 => 2 
624 + 1429 = 6x3 = —52 


(a) 


2 Properties of matrices 


In this section we review the algebraic properties of matrices, and show 
how solving the matrix equation Ax = b can be interpreted as finding the 
vector x that is mapped to the vector b by the transformation defined by A. 
Then we investigate a related number, called the determinant of the matrix, 
that can be used to decide whether a given system of linear equations has a 
unique solution. Finally, we look at some applications of determinants. 


2.1 Algebra of matrices 


A matrix of order or size m x n is a rectangular array of elements (usually 
real numbers) with m rows and n columns. If m =n, the matrix is a 

; a . {ll 
square matrix, an example of which is the 2 x 2 matrix Ei 9 


the matrix can be regarded as a row vector, an example of which is the 
1 x 3 matrix [2 3 4]. If n = 1, the matrix can be regarded as a column 


ee 


vector, an example of which is the 2 x 1 matrix B , which we often write 


in text as [5 7|?. The general m x n matrix A can be written as [a;;] to 
denote the matrix 


ai1 a12 Ain 

a21 a22 a2n 
A=]. ; : 

Gm1 Gm2 Amn 


Two matrices A = [a;;] and B = [b;;] are equal if they have the same order 
and aj; = bj; for alli =1,2,...,m and j =1,2,...,n. If all the elements of 
a matrix are zero, the matrix is the zero matrix, denoted by 0. 
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Matrices were first used in 
1858 by the English 
mathematician Arthur 
Cayley. 


You met these notations for 
column vectors in Unit 4. 


The element a;; is in the ith 
row and the jth column. 


Strictly speaking, we should 
write the m x n zero matrix 
as Omn, but the size of the 
zero matrix will always be 
clear from the context. 


Addition and scalar multiplication of m x n matrices 


If A = [a,j] and B = [};;] are matrices of the same order, we can form 
the sum 
A+ B= [aj; + 0;;] (component-wise addition). 
For any matrix A, and 0 of the same order, 
A+0O=A_ (the components are unaltered). 


The scalar multiple of a matrix A = [a,j] by a number k is given by 


kA = |kaj;| (multiply each component by k). 

The negative of the matrix A = |[a;;| is —A = [—ajj], so 
A+ (—A) =0. 

For two matrices A and B of the same order, A — B is given by 
A +(—B). 


Exercise 2.1 


1 2 3 1 3.47 —2 1 -6 
Let A= || 5 |. B=|4 Ls | and C= | 7 3 AE 


(a) Calculate 3A and —A. 
(b) Calculate A+ B and B+C. 


(c) Use the results of part (b) to verify that (A +B)+C=A+(B+C). 


Exercise 2.2 


Let A= E | and B = - 2) 


(a) Calculate A — B and B— A. 
(b) Verify that B— A = —(A-—B). 


*Exercise 2.3 


1 2 3 
4 5 6 


(a) Calculate 2A — 5B. 
(b) Verify that 3(A + B) =3A + 3B. 


Let A= | 


jf 3a 


4 —5 O]' 


The multiplication of matrices is more complicated. We illustrate the method 


by forming the product of two 2 x 2 matrices. 


Let A = e | and B = E a To form the product AB = C, we define 


3 4 7 8 
cj using the ith row of A and the jth column of B, so that 
C11 = 411b11 + aygbo1 = (1 x 5) + (2x 7) = 19, 
C12 = 411012 + aj2b22 = (1 x 6) + (2 x 8) = 22, 
C21 = a21b11 + a22b21 = (3 x 5) + (4 x 7) = 48, 
C22 = A21b12 + ao2boq = (3 x 6) + (4 x 8) = 50. 


Thus 


1 2][5 6 19 22 
c=AB= |; alle lela ae 


Section 2 Properties of matrices 


Matrix addition is 
commutative, 
A+B=B+A, 

and associative, 
(A+B)+C=A+(B+C). 


In this context, the number k 
is sometimes referred to as a 
scalar in order to distinguish 
it from a matrix. The same 
idea was used in Unit 4 in the 
context of vectors. 
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2 4 
123 
Similarly, if A = and B= }]-1 —-2}], then 
. li 5 | . 
el (1 x 4) + (2 x (—2)) + (3 x 3) 
~ |(4x 2)+(5 x (-1))+ (6x 5) (4x 4)+ (5 x (—2)) + (6 x 3) 


_ | 2 

~~ 13a 241° 
The above procedure can be carried out only when the number of columns 
of A is equal to the number of rows of B. 


Matrix multiplication 


The product of an m x p matrix A and ap x n matrix B is the m x n 
matrix C = AB, where c;; is formed using the ith row of A and the 
jth column of B, so that 


Cig = O41b1j5 + ainda; +--+ + Gipbp;. 


*Exercise 2.4 


Calculate the following matrix products, where they exist. 


@ |e alle al © 2 Jfos] © [2] ° -4 


—2 0 1 2 
(d) le : | i Ol la bi | 
4 1 -l —1 


Exercise 2.5 


3-1 5 XY 
Calculate Ax when A= |6 4 7] and x= | 22 J, and hence show that 
2 -3 O X3 


the equation Ax = b, where b= [2 5 6)”, is equivalent to the system of 
equations 


321, — 2+ 523 = 2, 
621 + 4x9 + 7x3 = 5, 
221 — 3x2 = 6. 


Earlier, you saw that addition of matrices is commutative and associative. 
We now give the rules for matrix multiplication. 


Rules of matrix multiplication 


For any matrices A, B and C of appropriate sizes, matrix multiplica- 
tion is associative, i.e. 


(AB)C = A(BO), 
and distributive over matrix addition, i.e. 


A(B+C)=AB+AC. 
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In Section 1 we referred to 
Ax = b as a convenient 
representation of a system of 
equations. This is consistent 
with the interpretation of Ax 
as the matrix product of A 
with x, as we show here. 


The phrase ‘of appropriate 
sizes’ means that all the 
matrix sums and products 
can be formed. 

In general, matrix 
multiplication is not 
commutative, so AB may not 
be equal to BA. But 
multiplication of numbers is 
commutative, i.e. ab = ba, for 
any numbers a and b. So this 
is a significant difference 
between the algebra of 
matrices and the algebra of 
numbers. 


Section 2 Properties of matrices 


Exercise 2.6 


1 1 1 4 2 0 
wet a=[5 3)B=|2 t]aac=|i s|- 


Verify each of the following statements. 
(a) ABABA (b) (AB)C = A(BC) (c) A(B+C)=AB+AC 


For a square matrix A, we define powers of the matrix in the obvious way: 
A? = AA, A?= AAA, and so on. 


An operation that we can apply to any matrix A is to form its transpose A’ A” is read as ‘A transpose’. 
by interchanging its rows and columns. Thus the rows of A? are the columns 
of A, and the columns of A? are the rows of A, taken in the same order. If If we denote A by [a;;] and 
A is an m x n matrix, then A” is an n x m matrix. Examples of transposes A? by [az], then ai; = Aji. 
are 
12a) |i Xe 27] re eG 
45 6| =/]2 5 8], |-6 1 eal 
7 8 9 3.6 9 0 4 


Rules for transposes of matrices 
For any matrix A, 
(AT)F =A. 
For any matrices A and B of the same size, 
(A+B)? = A? +B". 
If A is an m Xx p matrix and B is a p x n matrix, then 


Notice the change in order of 
T Tat 
(AB) =BA’. the terms involving A and B. 


Notice in passing that the dot product of two vectors can be written using Remember that a vector is 


matrix multiplication: ifa=[a, a2 a3)’ andb=([b; by bs)", then simply a matrix with one 
column. 
by 
a-b=aj1b1 + aobe + agb3 = [ a1 ag a3 | by | =a’ b. 
bg 


This fact will turn out to be extremely useful when we come to discuss vector 
calculus later in the course. 


A square matrix A is symmetric if A = A’. Symmetric here refers to 


1 2 3 
F ; : 1 2 
symmetry about the leading diagonal. The matrices E ‘| and|2 3 4 
are examples of symmetric matrices. 3.4 5 


Exercise 2.7 


1 2 2 5 1 0 
Let A= |3 4]},B=j]-1 -4 and C= | AR 
5 6 3.0 Ol 


(a) Write down A’, B? and C?. 
(b) Verify that (A +B)? = A7 +B’. 
(c) Verify that (AC)? = CTA”. 
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Of particular importance in the solution of simultaneous linear equations are 
the triangular matrices. An upper triangular matrix is a square matrix 
in which each entry below the leading diagonal is 0. A lower triangular 
matrix is a square matrix in which each entry above the leading diagonal 
is 0. A diagonal matrix is a square matrix where all the elements off the 
leading diagonal are 0. A matrix that is upper triangular, lower triangular or 
both (i.e. diagonal) is sometimes referred to simply as a triangular matrix. 
For example, the matrix on the left below is upper triangular, the one in 
the middle is lower triangular, and the one on the right is diagonal: 


1 2 8 1 0 0 1 0 0 
04 5], |2 4 0], |0 4 0 
0 0 6 0 5 6 0 0 6 
Exercise 2.8 


For each of the following matrices, state whether it is upper triangular, lower 


triangular, diagonal, or none of these. 
1 10 00 1 6 0 0 
(a) |}0 0 8 (b) |O 1 2 (c) [0 5 O 
O: De.3 12 3 00 4 


Earlier, you met the m x n zero matrix 0, which has the property that 
A+0= A for each m x n matrix A. The analogue for matrix multiplication 
is the n x n identity matrix I, which is a diagonal matrix where each 
diagonal entry is 1. For example, the 3 x 3 identity matrix is 


1 0 0 
I=|]0 1 0 
0 0 1 


If A is an n xX n matrix and IT is the n x n identity matrix, then 

TA = AI=A. 
If there exists a matrix B such that AB = BA =I, then B is called the 
inverse of A, and we write B= A~!. Only square matrices can have 


inverses. A matrix that has an inverse is called invertible (and a matrix 
that does not have an inverse is called non-invertible!). 


Exercise 2.9 


For each of the following pairs of matrices A and B, calculate AB and 
deduce that B= A“!. 


8 _1 
(a) A= E :| and B = : [ 
9 9 
1 O64 =1 —l -5 -2 
(b) A=|-2 -5 1|/andB=| 0 2 1 
4 11 -2 =2 1 1 


Finding the inverse of an invertible matrix 


There is a way to compute the inverse of an invertible square matrix using 
row operations similar to those that you used for Gaussian elimination. An 
example will make the method clear. 
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You met examples of an 
upper triangular matrix U in 
Section 1. 


The transpose of an upper 

triangular matrix is a lower 
triangular matrix, and vice 
versa. 


Strictly speaking, we should 
write the n x n identity 
matrix as I,,, but the size of 
any identity matrix will be 
clear from the context. 


A”! is read as ‘A inverse’. 


An invertible matrix is 
sometimes called 
non-singular, whilst a 
non-invertible matrix is called 
singular. 


Section 2 


Example 2.1 
Find the inverse of the invertible matrix 
22 -1 
A=/]3 5 1 
1 2 1 


Solution 
Form the augmented 6 x 3 matrix 


2 2 -1/1 0 0} Ri 
3.0 1/0 1 O} Re 
1 2 1}0 0 1] Rg 


consisting of A together with the identity matrix. Then perform row oper- 
ations in order to reduce the left-hand matrix to the identity, as follows. 


Stage 1(a) We reduce the elements below the leading diagonal in column 1 
to zero. 


22 -1/ 10 07 R 
R.-3R; |0 2 3/-3 1 0] Roa 
R3—5R; [0 1 $/-5 0 1] Raa 


Stage 1(b) We reduce the element below the leading diagonal in column 2 
to zero. 


20-41) 4 © 07 Ry 
02 $3/-3 1 0} Raa 
Rz3.—gRo [0 0 4] 4 -4 1] Rap 


Note that the left-hand matrix is now in upper triangular form. 
Stage 2(a) We adjust the element at the bottom of column 3 to one. 


22-1] 1 O07] R, 
02 3/-2 1 O] Raa 
4R3, [0 0 1] 1 -2 4] Rye 


Stage 2(b) We reduce the elements above the leading diagonal in column 3 
to zero. 
2 2 0 2 -2 4) Ria 
Roa — 3R3. | 0 2 0] -4 6 —10} Rop 
001] 1-2 4| Ry 
Stage 2(c) We adjust the element on the leading diagonal in column 2 to 
one. 
2 0 2 —2 4| Ria 
1 O| -2 3° —5] Ra 
6 0 1) 1 8 41 Re 


Stage 2(d) We reduce the element at the top of column 2 to zero. 


Ria —2Re, | 2 0 0} 6 —8 14] Rip 
0 1 O0;-2 3 —5]}] Rag 
0 0 1 1 -2 4] Rx 


Stage 2(e) We adjust the element at the top of column 1 to one. 


Ry» [1 0 0| 3 -4 7) Rie 
OT O\-2 & =5] Re, 
0.0.1) 2-2 4) Re, 


Properties of matrices 
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The resulting matrix on the right-hand side is the required inverse, 


3 -4 7 
ATt=|]-2 3 -5], 
1 =< a 


as you can readily check. Hi 


This technique extends to larger square matrices, but because it is rather 
inefficient, it is not widely used. 


Procedure 2.1 Finding the inverse of an invertible square matrix 


To find the inverse of an invertible square matrix A, carry out the 
following steps. 


(a) Form the augmented matrix A|I, where I is the identity matrix of 
the same size as A. 


(b) Use row operations to reduce the left-hand side to the identity 
matrix I. 


(c) The matrix on the right-hand side is the inverse of A. 


*Exercise 2.10 


Use Procedure 2.1 to find the inverse of a general 2 x 2 matrix 


A= bs 1 (ad — be 0). 


Note that you will have to treat a = 0 as a special case. 


The existence (or otherwise) of the inverse of a given square matrix A de- 
pends solely on the value of a single number called the determinant of A, 
written det A. For a 2 x 2 matrix, Exercise 2.10 yields the following result 
(which saves working through Procedure 2.1). 


Inverse of an invertible 2 x 2 matrix 


If A= E | , then the determinant of A is det A = ad — be. 


If det A # 0, then A is invertible and A~t = : i =h ; 
ad—be|—-ce€ @ 


We shall see shortly that it is possible to define det A for all square matrices, 
and the following result holds (although we shall not prove it). 


Condition for invertibility of a matrix A 
A matrix A is invertible if and only if det A 4 0. 


*Exercise 2.11 


For each of the following 2 x 2 matrices A, calculate det A and deter- 
mine A~!, if it exists. 


wa-[] mas] oa-[2y 
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Row interchanges will be 
necessary if one or more of 
the pivots is zero. 


It is sometimes convenient to 
write det(A) rather than 

det A. We study 
determinants in 

Subsection 2.3. 


So to find the inverse of a 

2 x 2 matrix, interchange the 
diagonal entries, take the 
negatives of the other two 
entries, and divide each 
resulting entry by the 
determinant. You may like to 
check that AAW! =I. 


This is equivalent to saying 
that a matrix A is 
non-invertible if and only if 
det A = 0. 


Section 2 Properties of matrices 


Properties of invertible matrices 


The inverse of an invertible matrix A is unique (i.e. if AB =I and We do not prove these 
AC =I, then B= C= A7}). properties here. 

If AB =I, then BA =I, so AA™! = A7!A =I, and hence the inverse 

of A7lis A. 


The rows of a square matrix are linearly independent if and only if the 
matrix is invertible. 


If A and B are invertible matrices of the same size, then AB is invert- Notice the change in order of 
ible and (AB)-!=B7!A7t. the terms involving A and B. 


Exercise 2.12 


(a) Show that if A and B are any two square matrices of the same size, 
then (AB)-!=B7!A7t. 


(b) Find A7! and BI. 
(c) Verify that (A +B)-!'4A1+B 1. 
(d) Verify that (AB)~! = B-1A7t. 


2.2 Linear transformations of the (x, y)-plane 


Linear transformations of the (x, y)-plane provide examples of a use of ma- You will see further examples 


trices. in Unit 10. 
Definition 
A linear transformation of the plane is a function that maps a two- The transformation is called 
dimensional vector [x y]’ to the image vector [az + by cx +dy}", linear because straight lines 
where a, b, c and d are real numbers. are mapped to straight lines. 


d 


. xr} ja b| |x| _ | ax+ by : : 
since A H = b | H = bee The image of any given vector 


can then be calculated. For example, the matrix A = | maps H to 


1 4 
2 3 2 2 8 
alt= [> a][t]=[5} 
Exercise 2.13 


Consider the linear transformation that maps a vector [x y]’ to the image 
vector [x+2y 3a +4 4y]*. 


We can represent any such linear transformation by the matrix A = ° A ; 


(a) Write down the matrix A for this linear transformation. 


(b) Use the matrix A to find the image of each of the following vectors. 
Pas: cay = ‘ec 0 
(i) 1 | (ii) | 1 | (iii) | | 
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For any linear transformation with matrix A, the images of the Cartesian 
unit vectors i=[1 0]? andj=[0 1)” are the columns of A. To see this, 
consider the general linear transformation that maps each vector [x y]" to 


the image vector [aie +ai2y agix+ azy]’. Then 


ayy X 1+ ay2 X 4 _ | 


. 1]. 
the image of a Is ee x 1+ag2 x0 a21 


and 


the image of is Bane Nr Gia | 2 | Oe y 
8 1 ag, XO+a92X 1] | agg |’ 1 
These images are the columns of the matrix A = Be as . 
a21 a2 
An interesting offshoot of these ideas concerns the determinant of the matrix 
of a linear transformation. 
. : ; b 

Consider the general linear transformation with matrix A = [° at The 0 1x 


image of [1 OJ” is [a cj”, and the image of [0 1)” is [b dj’. It can 
be shown that the unit square (with vertices at the points (0,0), (1,0), 
(1,1) and (0,1)) is mapped to the parallelogram defined by these image 
vectors, as shown in Figure 2.1. From Unit 4, we know that the area of (b, d) 
this parallelogram is the magnitude of the cross product of the two position 
vectors [a cl]? and [b d]”, which is |(ai+ cj) x (bi + dj)| = |ad — bc|. The 


a bd 


determinant det A of the 2 x 2 matrix A = ° | is ad — bc. So the area 


d 
of the parallelogram is the magnitude of det A. But the parallelogram is the (a, c) 
image of the unit square defined by the two vectors [1 O]" and [0 1]7, so 0 x 
|\det A| is the area of the image of the unit square. Accordingly, the larger 
|\det A|, the larger the images of shapes under the transformation. Figure 2.1 


*Exercise 2.14 


For each of the following matrices, calculate det A and compare your answer 
with the area of the parallelogram defined by the images of the Cartesian 
unit vectors i and j. 


(a) Ales | (b) el 2 (c) A=|) | 


We can also link these ideas with systems of linear equations. For example, 
the system of linear equations 


oe ro = 0, 


41+ 222 = 1, 24) 


can be written in matrix form as Ax = b, given by 


[ra] [z] =f) 22) 


Solving Equations (2.1) is equivalent to finding the vector x that is mapped 
to the vector b by the linear transformation with matrix A as shown in 
Equation (2.2). This is the ‘inverse process’ of our earlier work (for example 
in Exercise 2.13), where we were given x and asked to find the image vector 
b = Ax. This suggests that we might consider using the inverse matrix as 
a way of solving such systems of linear equations. 
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Section 2 Properties of matrices 


For the matrix A in Equation (2.2), 


= 
1 1 2 —-1 
-1l_ — Pre — 
A =, | =| t: Here det A = 1. 


We wish to multiply the equation Ax = b by A7!, but we must be careful 
with the order of the multiplication. Multiplying both sides of the equation 
on the left by A~', we obtain 


A-'Ax = A~'b. 


Since A~'A =I, we have Here we use the associative 
eal eel 
ae a law: A7~ (Ax) = (A7~A)x. 
so x is the image of b under transformation by the inverse matrix AW!. 
Therefore 
| [ 2 -1]fo]_[-1 
x2|  |—l 1 1] 1} 
Thus x; = —1, rg = 1 is the solution of this system of linear equations. 


This matrix approach to solving a system of linear equations can be used 
whenever the matrix A~! exists, ie. whenever A is invertible. However, 
except for 2 x 2 matrices, the inverse matrix is usually tedious to calculate, 
and it is more efficient to use the Gaussian elimination method to solve the 
system of equations. 


We come now to a result that is important with respect to the material in 
the next unit. Suppose that b = 0, so that we are looking for a solution to 
Ax = 0. What can we say about x? If A has an inverse, then multiplying 
both sides of Ax = 0 on the left by A~! gives AT'Ax = A7!0, sox = 0. 
Therefore, if Ax = O is to have a non-zero solution x, then A cannot have 
an inverse, i.e. it must be non-invertible. So we have the following result. 


Non-invertible square matrix 


If A is a square matrix and Ax =0 with x £0, then A is non- So det A = 0. 
invertible. 


2.3 Determinants 


In this subsection we summarize the main properties of determinants of 2 x 2 
matrices and extend the ideas to n x n matrices. 


Properties of 2 x 2 determinants 


Recall that if A = |“ | , then det A = ad — be. 
We frequently use the ‘vertical line’ notation for determinants: 
det A =|“ : = ad — be. 


Earlier in this section you saw that if det A #0, then A is invertible. In 
the following exercise we investigate some further properties of 2 x 2 deter- 
minants. 
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Unit 9 Matrices and determinants 


Exercise 2.15 


Calculate the following determinants. In Exercise 2.15 we have 


(with respect to the 
(a) . Hl ; : and , : (b) c ; and | : 7 matrix A, where appropriate) 
. illustrations of: 
b ka kb ba. Hi (a) lower and upper 
(c) ; ; (d) iy bh and | | (e) a d | and - d | triangular matrices, and a 
diagonal matrix; 
ka kb a a—me b—md (b) interchanges of rows and 
(f) a | (g) a a ar and a columns; 
(c) a transpose; 
d b (d) linearly dependent rows 
(h) ad — be ad — be and columns; 
c a (e) multiplying rows and 
ad—bc ad—bc columns by a scalar; 


(f) multiplying each element 
by a scalar; 


where A = | * b : (g) subtracting a multiple of 
ce d one row from another; 


In parts (b), (c), (e), (f), (g) and (h), compare your answer with det A, 


(h) an inverse matrix. 

We shall refer back to the 

results in Exercise 2.15 later. 
Introducing 3 x 3 and n x n determinants 


Just as the magnitude of the 2 x 2 determinant 


br be / > 
represents the area of the parallelogram defined by the vectors |[a, ag\ 
and [b; 


bo]7, so we can define a 3 x 3 determinant 


a, a2 a3 
by by bg 
c 
Cy, C2 C3 ~~ 
whose magnitude represents the volume of the parallelepiped defined by 0. 
the vectors a=[a, a2 as]?,b=[b) b2 b3)7 ande= [cy Cc cl", as 
shown in Figure 2.2. Figure 2.2 
Definition 
The determinant of the 3 x 3 matrix 
a, a2 43 
A= |b, bo bg 
cy Co C3 
is given by 
This is sometimes known as 
det A = a1 by bs) _ by 63 A ths by be ‘expanding the determinant 
C2 ¢ C1 C3 Cy C2 by the top row’. Notice the 


minus sign before the second 


= a1 b9c3 — a1b3c2 — agb1c3 + agb3c, + a3b1 cg — azboc}. (2.3) res 


As before, we frequently use ‘vertical line’ notation. For example, 


i2 4 
24 ale) laa = +4 ee = 11-404 52 = 23. 
es 5 66 2 6 25 
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A simple way to remember formula (2.3) is related to Sarrus’s Rule (see 
Unit 4). Draw a tableau with a1, a2 and ag in the top row, then repeat a, 
and ag. In the second row do the same with the second row of the matrix, 
and in the third row with the third. Then following the diagonal lines as 
shown, and multiplying the entries, gives the corresponding terms of the 
determinant det A, which are the elements on the fourth row of the tableau. 


ay a2 a3 
ow. ><, a a ; 
XK, 

—azb2c1 —a b3c9 —ayb1c3 a boc3 ab3c, a3b1 C2 


(The diagonals pointing to the right yield a positive term, those pointing 
left have a minus sign.) 


*Exercise 2.16 


Evaluate the following determinants. 


4 1 0 1 2 3 1 2 3 
(a) |0. 2 =—1 (b) |4 5 6 (c) |O 4 5 
2 3 i! 7 8 9 0 0 6 


The result of using the tableau shown above does not change under cyclic 
permutations of the rows (i.e. of the symbols a,b,c). However, as you can 
check, it changes sign if two adjacent rows are interchanged. This shows that 
the determinant can be expanded using any row of the matrix, provided that 
the rows are then taken in the correct order. If the order is reversed, then a 
minus sign is introduced. For example, 


bo Ob by Ob: b; O 
detA= a, 2 Ee — ag a Se + az ORS 
C2. «C3 Cy C3 Cy C2 
C2 C3 Cy C3 Cy C2 
=-a +a a 
') dy a 21b, b 31 by be 
ag ay @ a, ag 
= —b; + bo — bg 
C2 cy C¢ Cy C2 
a2 a3 ay a3 ay a9 
= ly, bg] lb, b3/ > Fla, 0b 
2 03 1 063 1 09 


Another interesting thing occurs if we use the tableau method to evaluate 
det A? (the determinant of the transpose of A). 


C1 ay by 
a 4 ue 
We be ee, 

—agbocy —ay,b3cg —agb1c3 abc3 a3b1 C2 a2b3c, 


Section 2 Properties of matrices 


This tableau method does not 
extend to n X n matrices for 
n> 3. 
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Unit 9 Matrices and determinants 


As you can see, 
det(A’) = det A, 
a result that extends to n x n matrices (though we shall not prove this). 


This means that to evaluate det A we can expand using columns in place of 
rows, as shown in Example 2.2 below. 


In fact, these observations make it easy to evaluate any 3 x 3 determinant 
in which one row or column is particularly simple. 


Example 2.2 
Evaluate the determinant of each of the following. 
8 9 -6 8 9 0 
(a) A=] 1 O 0 (b) B=] 138 -4 2 
32 —7 14 -6 2 0 


Solution 


(a) We can expand det A by the second row: 


8 9 —-6 
9 —6 8 —6 8 69 
1 0 0; =-1l | | sE 0 | = 0 | 
oe aa -~7 14 32 14 ao a7 
= —1(126 — 42) 
= —84, 
(b) We can expand det B by the third column: 
8 9 O 
13 —-4 8 9 8 9 
13 -4 2 =0| |-2| [+o] | 
a ee G2 6.2 if: 4 
= —2(16 +54) 
=-140. 


The armoury of techniques for evaluating determinants can be expanded by 
noting some general rules. 


Rules for n x n determinants 
(a) If A is a diagonal, upper triangular or lower triangular matrix, then 
det A is the product of the diagonal entries. 


(b) Interchanging any two rows or any two columns of A changes the 
sign of det A. 

(c) det(A7) = det A. 

(d) If the rows or columns of A are linearly dependent, then det A = 0; 
otherwise det A ¥ 0. 


(e) Multiplying any row or any column of A by a scalar k multiplies 
det A by k. 


(f) For any number k, det(kA) = k” det A. 


(g) Adding a multiple of one row of A to another row does not change 
det A. 


(h) The matrix A is non-invertible if and only if det A = 0. 
If det A 4 0, then det(A~!) = 1/det A. 


(i) For any two n x n matrices A and B we have 
det(AB) = det A det B. 
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The general n x n 
determinant is defined below. 
You may find it helpful to 
compare these general rules 
with the results we obtained 
for a 2 x 2 matrix in 

Exercise 2.15. The letters (a) 
to (h) here relate to the 
relevant parts of that exercise. 


The multiple can be negative 
or positive, so (g) covers 
subtracting a multiple of one 
row from another row. 


Rule (d) tells us that det A = 0 if and only if the rows of A are linearly 
dependent. Hence any system of linear equations with coefficient matrix A 
has a unique solution if and only if det A 4 0. If det A = 0, then the system 
has an infinite number of solutions or no solution. 


Exercise 2.17 


(a) Calculate 


woo bd 
EP Ne 
De w 


(b) Use the result of part (a) and the above rules to write down the values 
of the following determinants. 


2 1 °3 2 0 3 23 3 
(i) |3 1 6] (i) |1 2 1) ~~ Gi) [3 3 6 
0 2 1 3 1 6 0 6 I 
1 3 2 
(c) Lett A= |]0 1 -1 Calculate det A. Subtract twice the second 
0 2 3 


row of A from the third row to obtain an upper triangular matrix U. 
Calculate det U using the definition of a 3 x 3 determinant, and compare 
this with the product of the diagonal elements of U. 


It is also possible to define larger determinants. To do this, we proceed one 
step at a time, defining an n x n determinant in terms of (n — 1) x (n — 1) 
determinants. For example, to define a 4 x 4 determinant, we write 


a, a2 a3 a4 


bbe be ba{ bz Bs ba bby ba 
qm et ay a ‘ — ag a . - 
de dy de. 2 d3 a4 1 a3 a4 

bj by bg b; bo bg 

+a3}C1 C2 C4a]—a4|C1 Co C3]. 

d, dg dg d, dy d3 


Except in special cases, the calculation of large determinants like this can 
be very tedious. However, rule (g) for determinants provides the clue to 
a simpler method. Procedure 1.1 (Gaussian elimination) applied to the 
matrix A consists of a sequence of row operations where a multiple of one 
row is added to another. Each such operation does not change the value 
of the determinant. Thus we can deduce that det A = det U, where U is 
the upper triangular matrix obtained at the end of the elimination stage. 
Since the determinant of an upper triangular matrix is the product of the 
diagonal elements, Procedure 1.1 also provides an efficient way of calculating 
determinants of any size. 


Exercise 2.18 


In Example 1.3 we applied the Gaussian elimination method to the matrix 


1 -4 2 1 -4 2 
A=1]3 -—2 3] toobtainU=]0 10 —-83 
8 -2 9 0 0 2 


Calculate det A and det U, and hence show that det A = det U. 


Section 2 Properties of matrices 


Note that since 

det(A*) = det A, we can also 
deduce that det A = 0 if and 
only if the columns of A are 
linearly dependent. 


Note that alternate terms in 
the expansion have a minus 
sign. 


If we need to make essential 
row interchanges to avoid a 
zero pivot, we note that each 
row interchange will change 
the sign of the determinant 
(see rule (b)). 
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2.4 Some applications 


We conclude this section by illustrating how 3 x 3 determinants can be 
used to represent areas and volumes, and products of vectors. We start by 
revisiting the scalar triple product. 


Scalar triple product 


Let a = aji+ aoj + agk, b = byi+ bej + b3k and c = ci + coj + cg3k. The 
volume of the parallelepiped with sides defined by the vectors a, b and c is 
given by the magnitude of the scalar triple product (b x c) +a, which you 
will recall from Unit 4 can also be written as a- (b X c). The cross product 
of vectors b and c is 


bxc= (b2c3 — b3c2)i + (b3¢1 — bic3)j + (b1c2 — boc1)k 


= bo bs ._ by bg]. by be k 
C2 63 Cy C3 Cy co} ” 
hence 
a-(bxc)=a| 7 bs br bs + a; mee. 
2 ©3 CL C3 Ci C2 
ay a2 a3 
=|b, by bs}. (2.4) 
Cl C2 C3 


*Exercise 2.19 
Use the result of Exercise 2.17(a) to find the scalar triple product a: (b x c) 
when a= 2i+ j+ 3k, b= 2j+k andc = 3i+j+ 6k. 
Exercise 2.20 


Use Equation (2.4) to find the volume of the parallelepiped defined by the 
vectors a=i+k, b=i+ 2j and c =j+ 3k. 


Cross product 


The similarity between the formula for a 3 x 3 determinant and Sarrus’s Rule 
(in Unit 4) for evaluating the cross product of two vectors gives another way 
of remembering the formula for the cross product. We have seen above that 
if b= [bi be bs]? andc=|ci ce ce)’, then 


bg bs by bs by be 
cq C3 cy C3 2 


bxc= i- 


This expression can be remembered more easily if we write it as a 3x 3 
determinant: 


ij k 
bxc= by bg b3 . 
Cy CQ C3 
Exercise 2.21 


Express b X c as a determinant, and calculate its value, when: 
(a) b = 3i4+ 2j — 4k and c =i—j+ 3k; 
(b) b=[1 2 3]? andc=[6 5 4J?. 
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Scalar triple products and the 
other topics in this subsection 
were mentioned in Unit 4. 


Section 2 Properties of matrices 


Area of a triangle in the (x, y)-plane 


A 
Consider the triangle defined by the origin and the points with position Der eee 
vectorsa=l[a, ag ol? and b = [b1 be ol", as shown in Figure 2.3. Its 
area A is given by 
A= |axb| 
ij k 
= 5 det] a, a2 O y 
by bo O b 
= 5 (a,b — ab; )k| - a 
= 2 larbe — aabi| Figure 2.3 
_ 1 a, a2 
~ 2 det | by be | , (2.5) We use det for determinant 


: a8 : . here, rather than | |, to avoid 
The formula (2.5) agrees with our earlier interpretation of the magnitude of confusion with the modulus 


the determinant as the area of the parallelogram defined by a and b. function, which is also 


: ‘ i ; esent. 
We can extend this result to give a formula containing determinants for the a 


area of a triangle whose vertices have position vectors a= [a, a2 OJ’, 
b=([b; by O]7 andc=[c, cy 0]. Two sides of the triangle are given 
by the vectors a—c and b—c, so we can find the area as follows: 
Remember that c X c = 0 


A= 3|(a—c) x (b—c)| and that -c x b=bxec 
= (ax b) —(axc)+(b x c)| (from Unit 4). 
ij k ij k i j k 
_ $ det | a, a2 O} —det] a; aa O|] +det}] bo; bo O va 
by bo 0) Cy ©2 0 cy C9 0 : 
4 b 
= 5 (a1b2 _ a2b1)k _ (a1c2 — a2c1)k + (bic2 _ b2c1)k| 7 
5 |(a1b2 — agb1) — (aycg — agcy) + (b1c2 — b2c1)| _ : 
1 1 1 
= - det} a, 6b; cI}. (2.6) 
a2 by © 1 Way 33 
*Exercise 2.22 7 
c 
Use Equation (2.6) to find the area of the triangle whose vertices are 


a= 3i+ j, b =i+ 2j and c = 2i — j (see Figure 2.4). Figure 2.4 


End-of-section Exercises 


Exercise 2.23 
1 2 1 1 1 0 
Given the symmetric matrices A= |2 3 —-1] andB=J]1 O 1], 
1 -l 0 011 


show that AB is not symmetric. Verify that (AB)? = B’A?. 
Exercise 2.24 


Given A = | ; i]. calculate det A and A~!. Hence write down the so- 


-—1 1 
lution of the system of equations 


r+2y= 1, 
=f:+ yoo. 
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3 Matrices in action 


In Section 1 we used matrix notation to describe the solution of systems 
of linear equations by the Gaussian elimination method. The ubiquitous 
nature of matrices is partially explained by the fact that such systems of 
equations arise in many areas of applied mathematics, numerical analysis 
and statistics. In this section we look at two applications that involve solv- 
ing systems of linear equations: polynomial interpolation and least squares 
approximations. In each application, we are given a set of n + 1 data points 
(ti, ys), 2 =0,1,...,n, where x9 < 4] <--- < 2p, as shown in Figure 3.1. 


In the first application, we determine the polynomial y(x) = aj + ayx+--- 
+ayzx” such that y(x;) = y;, 7 = 0,1,...,n. This polynomial, defined on the 
interval xp < w% < &p, is called the interpolating polynomial. The graph 
of such a function passes through each data point. 


The second application arises, for example, when we conduct an experi- 
ment and, for each x;, the corresponding measurement of y; contains an 
experimental error. In such situations it may be preferable to construct a 
polynomial of lower degree that ‘best approximates’ the data, rather than a 
polynomial that passes through every point. 


3.1 Polynomial interpolation 


We are often given a table of values showing the variation of one variable 
with another. For example, in Example 2.1 of Unit 2 we met the initial-value 
problem 
dy 
dx 
Using Euler’s method with step size h = 0.2, we obtained a table of approx- 
imate values for the solution (see Table 3.1). 


=@2+y, y(0)=1. (3.1) 


Suppose that we wish to approximate the solution at x = 0.47. One way of 
doing this is to construct a polynomial through some or all of the data values 
and then use this interpolating polynomial to approximate the solution at 
x = 0.47. There are many ways of constructing interpolating polynomials, 
but we shall present just one method here. We start with a straight line 
approximation through the two points closest to x = 0.47. 


Example 3.1 


Find the equation of the straight line through (0.4,1.48) and (0.6, 1.856), 
and use it to approximate the solution of system (3.1) at x = 0.47. 


Solution 


Suppose that the line is y = ag + a,x. Since, from Table 3.1, y = 1.48 when 
x = 0.4, and y = 1.856 when x = 0.6, we obtain the system of equations 


ag + 0.4a, = 1.48, 
ag + 0.6a; = 1.856. 


The solution of these equations is ag = 0.728, a; = 1.88. Hence the equation 
of the straight line is 


y = 0.728 + 1.882. 
When x = 0.47, we have y = 0.728 + 1.88 x 0.47 = 1.6116. 
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Figure 3.1 
Table 3.1 

1X; Y 

0 O 1 

1 0.2 1.2 

2 0.4 1.48 

3 0.6 1.856 

4 0.8 2.3472 

5 1.0 2.97664 


In general, if we require the line through (xo, yo) and (#1, yi), we have (as 
in Example 3.1) 


ag + a1%0 = Yo, 
ag + a2, = V1, 


which can be written in matrix form as 


t=] [2]=[2]: 


that is, 
Xa=y, (3.2) 


where the matrix X contains the given values x;, and the vector y contains 
the given values y; (i = 1,2). We wish to determine the vector a, and this 
could be done, for example, by using Gaussian elimination. 


Exercise 3.1 


Consider the data in Table 3.1. Find the straight line through the points 
(0.6, 1.856) and (0.8,2.3472), and use it to find an approximation to the 
value of y at x = 0.65 and at x = 0.47. 


We now have two approximations for y(0.47), which do not agree, even to 
one decimal place. However, in Example 3.1 the value of x = 0.47 lies within 
the domain 0.4 < x < 0.6 of the interpolating polynomial, and we have in- 
terpolated a straight line to obtain the approximation. In Exercise 3.1 the 
value of x = 0.47 lies outside the domain 0.6 < x < 0.8 of the interpolating 
polynomial, and we have extrapolated a straight line to obtain the approxi- 
mation. Extrapolation is, in general, less accurate than interpolation, and 
we should use the result for the approximate value of y(0.47) from Exam- 
ple 3.1 rather than that from Exercise 3.1. 


In general, given n+ 1 data points, we can determine the interpolating 
polynomial of degree n of the form 


y = ag t ayer t aga? +--+ + anx” (3.3) 
in a similar way, so we can fit a straight line through two points, a quadratic 
through three points, a cubic through four points, and so on. 

Example 3.2 


Find the interpolating quadratic polynomial for the three data points 
(0.4, 1.48), (0.6, 1.856) and (0.8, 2.3472), and use it to find an approximation 
to the value of y at x = 0.47. 


Solution 
The three data points give rise to the system of linear equations 


ag + 0.4a; + (0.4)?a2 = 1.48, 
ag + 0.6a; + (0.6)?a2 = 1.856, 
ag + 0.8; + (0.8)?a2 = 2.3472, 


that is, 
1 0.4 0.16 ao 1.48 
1 06 0.36 a, | = | 1.856 


1 0.8 0.64 ag 2.3472 
—-_--eoww—-—T io” 


xX a y 


Section 3 Matrices in action 
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ing the Gaussian elimination method to solve these equations, we find 
= 1.0736, a, = 0.44, ag = 1.44. Hence the interpolating quadratic poly- 


nomial is 


So 


y = 1.0736 + 0.44 4+ 1.4427. 
y(0.47) = 1.598496. i 


Procedure 3.1 Polynomial interpolation 
To determine the interpolating polynomial of degree n 
y = ag +aye + agu* +++» +anz", 
through n+ 1 data points (;, y;), 7 = 0,1,...,n, proceed as follows. 


Solve, for the coefficients ag, a1,...,@n, the system of equations Xa = y 


given by 
1 xo ane LO ao Yo 
1 a x oy a) | 
LB, MEE See | wal Yn 


There are a number of questions that should be asked at this stage. 


(a) Will we always obtain a unique polynomial? 


The answer is yes, provided that all the x; values are different. 


(b) How accurate is any estimate obtained from an interpolating polynomial? 


This depends on the accuracy of the data. For accurate data, it is often 
sufficient to obtain interpolating polynomials of increasing degree and 
then to look for consistency in the estimates. Estimates for values of x 
close to the data points are likely to be more accurate than those that 
are further away from the data points. Interpolation is, in general, more 
accurate than extrapolation. 


What degree polynomial should we use? 

This again depends on the accuracy of the data. In theory, if the data 
are very accurate, then we can use polynomials of high degree. In prac- 
tice, as you will see in the computing activities, high-degree interpolat- 
ing polynomials often oscillate rapidly, which may cause difficulties. A 
sensible strategy is to start with a low-degree polynomial and increase 
the degree while looking for an appropriate level of consistency in the 
estimates. 


(d) Which points should we use? 
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The best strategy, when using an interpolating polynomial of degree n, is 
to select the n + 1 points which are closest to the value of x for which you 
want to estimate the value of the underlying function. Unfortunately, if 
you need estimates at several different points, this might involve calcu- 
lating several different interpolating polynomials of degree n, each based 
on a different subset of nm +1 points selected from the data points. A 
sensible compromise might be to use a different interpolating polynomial 
of degree n for each subinterval 7; < x < x;41, based on the n+ 1 data 
points closest to this subinterval. 


This value for y(0.47) is fairly 
close to the solution found in 
Example 3.1. 


Table 3.1 provides only an 
approximate solution to the 
differential equation at the 
specified values of x;, and this 
illustrates a case where the 
initial data do not represent 
values of the ‘true’ function. 
The accuracy of the 
interpolated values is then 
limited by the accuracy of the 
values of yj. 


Exercise 3.2 


Determine the quadratic polynomial that passes through the data points 
(0.2, 1.2), (0.4, 1.48) and (0.6, 1.856) from Table 3.1. Use it to estimate the 
solution of the initial-value problem dy/dx = x + y, y(0) = 1 at x = 0.47. Is 
this estimate likely to be more accurate than that found in Example 3.2? 


3.2 Least squares approximations 


In the previous subsection we described a method of finding the interpolating 
polynomial of degree n that passes through n+ 1 data points. However, in 
many practical problems, where the data are subject to random errors, it is 
often preferable to approximate a set of data points using a polynomial of 
low degree that passes close to, rather than through, the data points. We 
may, for example, attempt to find the ‘best’ straight line corresponding to 
a set of data points. 


Such problems can arise in modelling, where we may have obtained a rela- 
tionship between two variables and wish to test this relationship by com- 
paring it with experimental results. For example, suppose that a set of 
measurements has produced the values shown in Table 3.2. Looking at a 
plot of these values (see Figure 3.2), it is clear that, within the range of 
possible experimental error, a linear relationship exists between y and a. 
The question arises as to which line gives the best fit. 


We are looking for a relationship of the form y = ag + a,x. Suppose that 
we write the equations as 


ag+ ay =0.9, 
ag + 2a; = 2.1, 
ago + 3a; = 2.9, 
ag + 4a; = 4.1. 


There are four equations and only two unknowns: this is an example of an 
overdetermined system of equations, i.e. a system where the number of 
equations is greater than the number of unknowns. We would be extremely 
lucky if there were values of ag and a; such that all the equations were 
satisfied exactly. It is more usual to find that the system has no solution, 
i.e. we have an inconsistent system of equations. Since the data we are 
considering here are subject to experimental error, we cannot expect to find 
values of ap and a; that will satisfy exactly all the equations. 


Suppose that we measure the deviation d; as the vertical (y) distance of each 
data point from the straight line y = ag + a,x, whose equation we wish to 
find, so that 


d; = (ag + a,a;)-—y; (4 =1,2,3,4). (3.4) 


We wish to find the line that is ‘closest’, in some sense, to our set of data 
points, and we choose the sum of the squares of the deviations as our measure 
of ‘closeness’. (The sum of the deviations would not do, because negative 
deviations would cancel out positive deviations. ) 


For the least squares method we seek the straight line that minimizes the 
sum of the squares of these deviations. Thus, for our experimental data, we 
wish to minimize d} + d3 + d3 + d?, and this involves solving a system of 
linear equations. 


Section 3 Matrices in action 


In statistics, this is called 
linear regression. 


Table 3.2 


x| 1 2 3 4 
y|09 2.1 2.9 41 


Figure 3.2 


In other words, we seek the 
values of ap and a, that 
minimize this sum of squares. 
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We can write Equations (3.4) in vector form as d = Xa -— y, where 


1234 
_ T _ T 
a=|a9 ai) and y=(09 2.1 29 4.1). 


T 
d=[d, dz ds dl)’, ae bt | 


From the definition of the transpose, we know that 
dd=€+6+2+ di, 


and this is the sum that we are trying to minimize with respect to ag and ay. 
Writing 


d= (Say) = (Kay -y aa ky", 
we have 


d’d 


(a’X* — y")(Xa-—y) 
=a! X’Xa— al XTy — y'Xa + yy 
= alXTXa —- 2a? XTy+y’y. 
For our particular example, this equation is given by 


4 10 10 
dd= [ ao ay | Ee a0 i —2 [ ao ar | Fel + 30.44. 


This is a function of the two variables aj and a;, and we want to minimize 
this function with respect to ag and a,. We do not yet have the mathematics 
to explain how to minimize a function of two variables; we postpone that 
explanation until Unit 12. However, it transpires that the vector a that 
minimizes the expression a! X7 Xa — 2a? X7y + y’y satisfies 


(X?X)a = X7y. 
Thus we want to solve 
4 10} }ao} _ | 10 
10: 30) ay | ~ | 30.2)" 
and this has solution ag = —0.1, aj = 1.04. So the least squares straight line 
is y= —0.1+ 1.042. 


Procedure 3.2. Least squares straight line 


To determine the least squares straight line approximation of the form 
Y =a + ae, given n+1 data points (x0, Yo); (x1,41); ee | (Zinj Yn) 
where n > 1, proceed as follows. 


(a) Write down the matrix X and the vectors a and y given by 


X= 
ay 


1 2x9 Yo 

1 aaa _ | ao | _ Y1 

. . ’ a= ’ y = . * 

1 & Yn 

(b) Solve the pair of linear equations represented by (X’X)a = X’y 
to determine the vector a, whose elements are the coefficients of 
the straight line approximation. 


Exercise 3.3 


Find the least squares straight line for the points (1,2), (2,2), (2,2) and 
(3,4). Sketch the graph of your solution. 
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Strictly, d’d is a matrix of 
order 1 x 1, but it is common 
practice to treat such 
matrices as numbers. 


Since y’ Xa is a 1 x 1 matrix, 
we have 


y? Xa = (y? Xa)? = aTX'y. 


A useful check on the 
calculations is that the matrix 
X?X is symmetric, since 
(X?X)? = XT (XT)T= X?X. 


Section 4 


Exercise 3.4 

Find the least squares straight line for the points (—1,—1), (0,0) and (1, 1), 
and comment on the accuracy of your solution. 

Exercise 3.5 


Find the least squares straight line for the points (1,1), (2,4), (3,9) and 
(4,16), and comment on the accuracy of your solution. 


End-of-section Exercise 


Exercise 3.6 
You are given the data points (—1,0), (0,—1), (1,0) and (2,1). 


(a) Find the least squares straight line approximation, and evaluate the 
least squares deviation d’d. 


(b) Find the interpolating cubic polynomial. 


4 Ill-conditioning 


In this section we examine briefly a difficulty that may occur when we at- 
tempt to find the numerical solution to a given problem. This arises because 


Ill-conditioning 


some problems are inherently unstable in the sense that very small changes Such changes ‘perturb’ the 


to the input data (due perhaps to experimental errors or rounding errors) data. 
may dramatically alter the output numerical values. Such problems are said 


to be ill-conditioned. Problems that are not 
ill-conditioned are said to be 
well-conditioned. 


4.1 Ill-conditioning in practice 


In this subsection we use examples to help us define what we mean by ill- 
conditioning for a system of linear equations Ax = b. A proper analysis of 
ill-conditioning for such a system would include a discussion of the effect of 
small changes to the coefficient matrix A, but, to simplify the theory, we 
discuss only the effect of small changes to the right-hand-side vector b, and 
we assume that A is exact. 


In earlier courses, we introduced the idea of absolute error: for real numbers, See Subsection 4.2 of the 


the error in an estimate % of the exact value x is % — x, and the absolute Handbook. 
error is |% — x|. We need to extend this idea to vectors. There are a number 

of ways of doing this; we shall do it by defining the norm of a vector 

x= |e) ay os tel as the magnitude of the element of largest magni- 

tude in x. Thus the norm of x, using the notation ||x||, is 


|x|] = max |a;|. 
a eee 


For example, if x =[2 -—3 1)", then ||x|| = max{2,3,1} =3. 


Suppose that we have two vectors, x and X, where X is an estimate of the We prefer to discuss changes 
exact vector x. The change in x is 6x = X — x, and the absolute change _ here rather than errors, since 


is ||6x|] = |X — x]. 


on the solution. 


we are making small changes 
to the data to see the effect 
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Example 4.1 

Suppose that x = [2 -—3 1)", and that x = [2.02 -3.11 1.03)" is an 
approximation to x. Compute the change and the absolute change in x. 
Solution 


The change is 6x =X —x = [0.02 —0.11 0.03]7. The absolute change is 
\|ox|| = max{0.02,0.11,0.03} =0.11. I 


*Exercise 4.1 


Determine the absolute change in the approximation ¥ = [3.04 2.03 0.95]7 
to the exact vectorx =[3 2 lj". 


In discussing ill-conditioning, we are interested in the solution x of the equa- 
tion 
Ax =b, 


and, in particular, in how small changes in b give rise to changes in x. 

The solution may be written as x = A~/'b, which we can regard as a linear We assume that A is 
transformation of b to x. If we allow each element of b to change by the invertible. 

small amount +e, forming the vector b = b + db, then the solution we obtain 

will be X = A~!(b + db), where the change is 


ix =x=—x=A-\(b+6b)=— A“ b= A~" ob. (4.1) 


If ||6x|| is large compared to ||db||, then we know that the system of equations 
is ill-conditioned. 
Example 4.2 


Consider the equation 


F “io | — Er (4.2) 


If the values on the right-hand side are changed by +0.1, how big a change 
in the solution might arise? 


Solution 

We have 
_ it = _ hed i: | Hi 20 
A=|i ene = (qos) le = (7 a 


Thus the solution of Equation (4.2) is 
[i920] Tt 7:13 
Fe Vow AG) 008) | = 119) 
Now taking ¢ = 0.1, the possible values for b are 
—1.1 —1.1 —0.9 —0.9 
—0.9 |’ —0.7 |’ —0.9 }’ —0.7}’ 


with corresponding values for db 


—0.1 —0.1 0.1 0.1 Note that ||db|| = 0.1 for each 
—0.1 ’ 0.1 ? —0.1 ? 0.1 : of these vectors. 


Applying A7! to each db above yields, respectively, the following values 
of dx: 


Po]: 2°] [2°] fo") 
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Looking at the norm of 6x in each case, we see that the middle two vectors 
have 


|éx|| = 3.9, 


which is 39 times bigger than the norm of db. Thus a change of magnitude 
0.1 in the entries of b may cause a change of magnitude 3.9 in the entries of 
the solution. Hl 


A geometric interpretation will help to make this clearer. If, in Example 4.2, 
we change the elements of b by +0.1, then the changed vector b lies at a 
corner of the square with centre b = [-1_ —0.8]” shown in Figure 4.1 (which 
is not drawn to scale). 


(14, -07)—.- 


e 
(—1, —0.8) 


(-1.1, -0.9) — ~—~(-0.9, 0.9) 


Figure 4.1 


Such a point b is mapped to a point X under the linear transformation 
represented by A7!, ie. ¥ = A~'b, so the point X must lie at a corner of 
the parallelogram shown in Figure 4.2. (You saw in Subsection 2.2 how 
linear transformations map squares to parallelograms. ) 


yA 
(6.9, 4) 
(3, 2) 
O z 
Figure 4.2 


The greatest change in x occurs when X is at a vertex furthest from (the exact 
solution) x = [3 2)” in Figure 4.2, in other words either at X = [6.9 4]", The vertex x= [6.9 4]? 


or at X=[-0.9 O]?. arises from choosing 
. ~ b=[-1.1 —0.7|", and 
In either case, ||6x|] = ||x — x|| = 3.9, as we have seen. So here we havea __ BP ghee, 
x = [-0.9 0] arises from 


situation in which a numerical change of 0.1 in the elements of b has caused 
a change of 3.9 in an element of the solution. We would certainly regard 
such a system of equations as ill-conditioned. It is the ratio of ||6x|| to ||db]| 


choosing b = [—0.9 —0.9}7. 


that is relevant here. In this instance, we have found a point b and its image 
x for which this ratio is 3.9/0.1 = 39 (a rather large number). Once we have 
found one instance of b for which the ratio is large, we say that the system 
of equations is ill-conditioned. 
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We define the absolute condition number &, for the problem of solving 
Ax = b, when b is subject to small changes of up to € in magnitude, to be 
the largest possible value of the ratio of ||5x|| to ||db]], i-e. 


— || ox 
= Ube 
\|sb\|<e ||db]] 


Because A~! represents a linear transformation, this largest value occurs 
when the perturbed vector b lies at a corner of the square of side 2¢ centred 
on b. In the example above, we have shown that k, = 39, which tells us 
that the system of equations is absolutely ill-conditioned. 


The cause of the ill-conditioning can be deduced by a re-examination of 
Equation (4.2). Solving this equation corresponds to finding the point of 
intersection of the two lines 2 — 2y = —1 and x — 1.9y = —0.8. These lines 
are almost parallel, so a small change in b can give rise to a large change in 
the solution. 


These ideas can be applied to many problems other than those involving 
systems of linear equations. 


Criteria for absolute ill-conditioning 


Suppose that small changes are made in the data for a problem. The 
problem is absolutely ill-conditioned if it is possible for the abso- 
lute change in the solution to be significantly larger than the absolute 
change in the data. 


Normally, the interpretation of significantly larger is dependent on the 
context. However, for the sake of clarity and certainty, we shall adopt 
the following course convention. A problem is judged to be: 


e absolutely well-conditioned if the absolute condition number k, for 
the problem is less than 5; 


e neither absolutely well-conditioned nor absolutely ill-conditioned if 
ka is greater than 5, but less than 10; 


e absolutely ill-conditioned if ky is greater than 10. 


For very large systems of equations, we may try to detect ill-conditioning 
by making small changes in the data. If, for the changes we try, the changes 
in the solution remain small, then we can say only that we have found no 
evidence of ill-conditioning and that the problem may be well-conditioned. 
For small systems of equations, however, where it is feasible to compute 
the inverse matrix A~', we can give a much better way of detecting ill- 
conditioning or well-conditioning. 


From Equation (4.1) we have 
6x = Abb. 


Hence any change in the right-hand-side vector b will be multiplied by A7! 
to give the change in the solution. 


To see how this works, we return to the linear problem in Example 4.2: 


i: 2 41 
1 19)" = lps |? 
where 


4 19: 90 
a Pet ae 
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We do not prove this here. 


We shall discuss methods of 
determining k, and specific 
criteria for absolute 
ill-conditioning shortly. 


Alternatively, we can think of 
small errors or uncertainties 
in the data giving rise to 
significantly larger errors or 
uncertainties in the solution. 


Different numbers might be 
more appropriate for large 
systems of equations, but 5 
and 10 are suitable choices for 
systems of two or three linear 
equations. 


Changes that we have not 
tried might cause significantly 
larger changes in the solution. 


Section 4 Ill-conditioning 


The argument will be clearer if we let db = [e1 eq|’, where €, = te and 
€9 = xe. Then we have 
—-19 20] /e 
= -1 = 1 
ee ip ‘a [| 
— | —19e1 + 20€2 
~ | —10e, + 10€9 | 
We can see that the largest element of 6x occurs when €, = —eé and €2 =¢€, Notice that the sign of ¢; is 
giving 6x = [39e 20e]” and ||dx|| = 39e. chosen to give the largest 


possible result for ||dx|]. 
Now ||db|| = ¢, therefore k, = 39, as we observed above. It is no coincidence 


that this is also the maximum row sum of the magnitudes of the elements 
of A>": 


This example illustrates the following result (which we do not prove here). 


Absolute condition number of an invertible n x n matrix 


The absolute condition number for small changes to the right-hand-side 
vector b in the solution of Ax = b is given by the maximum row sum 
of the magnitudes of the elements of A}, ie. 


= max {|¢j1| + |ci2| +++: + |cinl}, 


where the c;; are the elements of the matrix C = Att. 


*Exercise 4.2 


Determine the absolute condition number for small changes to the right- 
hand-side vector b in the solution of Ax = b when A and its inverse are 


given by 
i: a 8 6 6 -8 
A=|75 6 5], At=|-20 -24 30 
10 7.5 6 15 20 —24 


In order to determine the conditioning of a problem, we have had to do what 
we had hoped to avoid: calculate the inverse of the matrix A. However, this 
may be the price we have to pay if we are worried that our problem may be 
sensitive to small changes in the data. 


The cure for ill-conditioning is fairly drastic. We can abandon the current Remember also that we have 


equations and try to find some more data — or even abandon the model. not discussed the effect of 
changes to the entries in A. 


*Exercise 4.3 


For each of the following examples, determine the absolute condition number These examples are also 
and comment on the conditioning of the problem. considered in Activity 5.2(a). 
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5 Computing activities 


In this section there are two computing activities involving the computer 
algebra package for the course. The first activity is designed to help you 
consolidate the material in the first three sections of the unit, while the 
second deals with ill-conditioning. 


Use your computer to carry out the following activities. 


*Activity 5.1 


Use the Gaussian elimination method with essential row interchanges to 
solve Ax = b to determine x for each of the following examples. Compare 
this solution with the solution obtained using the computer algebra package’s 
linear equation solver. Compare det A and det U, and comment on your 
findings. 


1 -4 2 —9 
(a2) A=|3 -2 3], b=] 7 

8 -—2 9 34 

0 10 —83 34 
(b) A=/]1 -4 2], b=|-9 

0 0 2 4 

1 4 -8 2 
(c) A=|]1 2 2], b=]|5 

2 2 9 ¢ 

1 4 -8 2 
(4) A=]1 2 2], b=| 5 

2 2 9 13 
*Activity 5.2 


(a) This part of the activity allows you to explore absolute ill-conditioning 
graphically for systems of two equations in two unknowns. 


For each of the following systems, investigate whether perturbations 
(small changes) of +e to each of the elements of b cause significantly 
larger changes in the solution. Hence determine whether or not the 
problem is absolutely ill-conditioned. 


i) ee i fale 


(b) Consider the problem in Example 1.2, where the coefficient matrix A 
and the right-hand-side vector b are 


i i 2 = 
As 9S). tes) 4 
8 2 9 34 


Investigate whether this problem suffers from ill-conditioning. You may 
like to try ¢ = 0.1 for all the elements of b and then a different ¢ for 
each element of b. 
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Outcomes 


Outcomes 


After studying this unit you should be able to: 

e understand how a system of linear equations can be represented using 
matrices; 

e solve 2 x 2 and 3 x 3 systems of linear equations by the Gaussian elimi- 
nation method, using essential row interchanges where necessary; 

e add, subtract and multiply matrices of suitable sizes, and multiply a 
matrix by a scalar; 

e understand the terms transpose of a matrix, symmetric matrix, diagonal 
matrix, upper triangular matrix, lower triangular matrix, zero matrix, 
identity matrix, inverse matrix, invertible matrix and non-invertible ma- 
trix; 

e understand that a matrix can be used to represent a linear transforma- 
tion, and know what this means geometrically for a 2 x 2 matrix; 

e find the inverse of a 2 x 2 matrix; 

e evaluate the determinant of an n x n matrix, by hand when n = 2 or 3, 
and using the Gaussian elimination method for n > 3; 

e use the determinant of a matrix to evaluate cross products, areas and 
volumes; 

e find the interpolating polynomial of degree n passing through n+ 1 data 
points, by hand for n < 3; 

e determine the least squares straight line approximating a set of data 
points; 

e understand what is meant by absolute ill-conditioning for a system of lin- 
ear equations, and know how to determine whether a particular problem 
is absolutely ill-conditioned; 

e use the computer algebra package for the course to solve many of the 
problems posed in this unit. 
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Solutions to the exercises 


Section 1 


1.1 Stage 1(a) We eliminate x; using E2 — 5Fi, 
which gives 


—32x2 ar 7x3 = 10, E44 
followed by E3 — 4E), which gives 
—622 TT t3 = Us Ea 


Stage 1(b) We eliminate x2 using £3, — 2E2,, which 
gives 

—1323 = —13. E3p 
Stage 2 The solution is obtained by back substitu- 
tion. From F3,, we find x3 = 1. Substituting this into 
Fg, gives —34%2 +7 = 10, hence 2 = —1. From Fi, 
rz, —-1—1=2, hence x; = 4. So the solution is 


x, =4, ta =-l, v3 = 1. 
3-5] 8 

1.2 (a) ato= [3 ~§| 5] 
1 2 Oj] 38 

(b) Ajb=|]2 -1 1] 1 
0 1 -1l)-1 


1.3 The augmented matrix representing these equa- 
tions is as follows. 


1 2/4] R, 
3 -1/5] Re 
Stage 1 We reduce to zero the element below the lead- 
ing diagonal. 


1 2 4} R, 

Re — 3Ri E —7 | | Raa 

Stage 2 The equations represented by the new matrix 

are 

%+2%= 4, Fy 

{ = 7x9 =-—7. Ea 

From F2,, we find 2 = 1. Substituting this into £,, we 
obtain 71 +2 =4. Hence x; = 2, giving the solution 


x = 2, r=1. 


1.4 The augmented matrix representing these equa- 
tions is as follows. 

1 1 -1] 2] Ry 

5 2 2/20] Re 

4 =2 <3) 15)| Rs 


Stage 1(a) We reduce the elements below the leading 
diagonal in column 1 to zero. 
1 1 -1}] 2] Ry 
R2-5Ri |0 —-3 7/10; Rea 
R3;—-—4R; |0 —6 1] 7] Rea 
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Stage 1(b) We reduce the element below the leading 
diagonal in column 2 to zero. 
1 1 -1 2] Ry 
0 -3 7) 10} Roa 
Rsa-2Rea. |O O -13]-13] Resp 


Stage 2 The equations represented by the new matrix 
are 


y+ L2- #3= 2, Ey 
—3%9+ 7x3= 10, Ex 
— 1323 = —13. E3p 
From E3p, we have x73 = 1. 
From E24, we have —32%2 + 7 = 10, so rg = —1. 


From FE), we have 7; —1—1=2,s0 7, = 4. 
Hence the solution is 
L3 = 1, 


ry = 4, r= —1, 


as we saw in Solution 1.1. 


1.5 The graphs of the four pairs of equations are 
sketched below. 


y A y A 
O x O x 
(a) 

(b) 
yA yh 
O x O x 
(c) (d) 


(a) The two lines are parallel, so there is no solution. 


(b) There is one solution, at the intersection of the two 
lines. 


(c) The two lines coincide, so there is an infinite num- 
ber of solutions, consisting of all the points that lie on 
the line. 


(d) There is one solution, at the intersection of the two 
lines. 


1.6 For the two lines to be parallel they must have the 
same slope. Assuming that b #4 0 (which then requires 
d #0), the first line has slope —a/b, while the second 
line has slope —c/d. Hence the two lines are parallel if 


a Cc 
ith b=d=0 or -=-. 
either © oS 


1.7 (a) Write down the augmented matrix. 
1 -2 5} 7} Ry 
Ab=|1 3 —4|/20] Ry 
1 18 -—31/40] Rs 


Stage 1(a) Reduce to zero in column 1. 
1 -2 5] 7] Ri 
R.—R, | 0 5 —-9]13]} Roa 
R3—Ri |0 20 —36] 33] Raa 


Stage 1(b) Reduce to zero in column 2. 
1 -2 5 7) Ry 
0 5 —9 13} Roa 
Rz3, —4Reaa._ | 0 0 0|—-19} Resp 


Stage 2 Try to solve the equations represented by the 
rows of the above matrix. 

Since the rows of A are linearly dependent, but the rows 
of Alb are not, there is no solution. 


(b) Write down the augmented matrix. 
1 -2 5} 6] Ri 
Alb=]1 3 —4] 7] Ro 
9 @ 12/12] Ry 


Stage 1(a) Reduce to zero in column 1. 
1 -—2 5/6] Ri 
R2—- Ri 0 5 —-9}1] Raa 
R3;—2R, |0 10 —22/0] Resa 


Stage 1(b) Reduce to zero in column 2. 


1 -2 5 6| Ri 
0 5 —9 1} Roa 
R3q — 2Re_ ‘| 0 0 -—4)-2] Resp 


Stage 2 Solve the equations by back substitution. 


Since the rows of A are linearly independent, there is a 
unique solution. Back substitution gives 


t= 5.7, 3 = 0.5. 
(c) Write down the augmented matrix. 
1 —-4 1 14} Ri 
Ab=5 =f =f) 3] Re 
6 14 -6|-52| R; 


i 1.1, 


Stage 1(a) Reduce to zero in column 1. 


— ae id] Ry 
Re =—SRy hs 16° 6.) 68) Ra, 
Ry—6Ri |0 $8 12 | 136) Re, 


Stage 1(b) Reduce to zero in column 2. 
1 -4 1 14] R, 


0 19 -—6|-—68] Roa 
R3, — 2Re_ ‘| 0 0 O 0} Reap 


Solutions to the exercises 


Stage 2 Solve the equations by back substitution. 


Since the rows of A|b are linearly dependent, there is 
an infinite number of solutions. Putting 73 = k, back 
substitution gives the set of solutions 


xv, = (5k —6)/19, x2 = (6k — 68)/19, 
Here 
Rap = Raga — 2Raa 
= (Rs — 6R;) — 2(R2 — 5R1) 
= R;3 — 2R2+ 4R; = 0, 
which is the required non-trivial linear combination of 
the rows of Alb that gives a row of zeros. 


t= k. 


Section 2 
ate) sa=| 15 is Fal 
-1 2 -3 
-a=(7) 5 = 
tft, O23 S47 
(b) A+B=([17 5 +(—5) a 
_f2 5 10 
=lg 0 6]? 
_{1+(¢-2) 341 £7+(-6) 
BPO | 4.2(-5) (pes (03 | 


Thus (A +B)+C=A+(B+C). 


2.2 (a) A-B= E =| _ Ee | 


[33 S3)-[4 4 


(b) From part (a) we have B— A = —(A —B). 
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2.3 (a) 24 5B =| 4 2} — [20 15 35 


8 10 12 20 —25 
_ | = ait 35 
~|-12 35 12 
2 5 10 

(b) (A+B) =3/ 5 ae 


3 6 9 
s+ 3B =|) 15 | E 
B 
Thus 3(A + B) = 3A + 3B. 


2.4 (a) E 
(b) [2 |g |= [2 4] 


c) [2] [s 9 -41=[§ 5 <s| 


1 


(e) The product does not exist, because the left-hand 
matrix has only 1 column, whereas the right-hand ma- 


trix has 2 rows. 


3 —1 5 Cy 
6 4 7 x2 
2 -3 0 X3 


321 = 22 -- 5x3 
= 6x1 + Ato ae 7x3 c 


221 = 322 


2.5 Ax 


The two matrices Ax and b are equal only if 


321 = 29 523 = 4; 
62, + 4% + 7x3 = 5, 
221 = 322 = 6. 


Hence the equation Ax = b is equivalent to the system 


of linear equations. 


ee ee OLE tae 
| 


ma EE JTS 


Thus AB 4 BA. 
@) apyc=[7 ai|[t s]=[28 zo): 
amoy=[5 2]([2 allt 3]) 
=[5 a|[5 75] 
Be 
Thus (AB)C = A(BC). 
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1 2 aha 
(b) (A+B) 3 4/+|/-1 -4 
5 6 1 
T 
7 a 28 
ol ae (| ae a 
8 7 
fae Las a af 3 
aye eae alls ce al 
[3 2-3 
=e ee Fi 
Thus (A +B)? = A? +B’. 
E 
i 2 
(c) (AC)T=[|3 4 E | 
5 6 
a 
2s 5. at: a7 
= ee P= N6 a ae 
LF 18 


c,r_[1 2])f/1 3 5]_[5 11 17 
Oe =i aE 4 Ate 12 fel 
Thus (AC)? = CT A?, 


2.8 (a) The matrix is upper triangular. 


(b) The matrix is neither upper triangular nor lower 


triangular and so cannot be diagonal. 


(c) The matrix is upper triangular and lower triangu- 


lar and hence diagonal. 


en 
2.9 (a) AB= E :| : 9 
9 9 


Since AB =I, it follows that B= A7!. 


i, <3. 21 L 28 
(b) AB=|-2 -5 1 y 2 4 
4 


lI 
oor 


2.10 We form the 4 x 2 augmented matrix. 
a b/1 O} Ry 
ec d/0 1 Ro 


Going through the stages of Procedure 2.1 in turn we 
obtain, for a 4 0, the following. 

a b 1 0 R, 

~ 1 Roa 


£ 
a 


5 b | 1 0 | Ri 
ae Roa 0 1 a be sate Rap 
Ri > bRop re ail os Res Ria 
Te be te Rop 
aRia 1 0 wa — wd Rip 
0 1 oe a Rap 
Hence 


= 1 d —b 
A= ; 
ann | 4 


If a= 0, we must start by interchanging R, and Rg. 


c d}j|0 1 Ria 
R, 7 Ro 0 b}1 O Roa 


Again, going through the stages of Procedure 2.1 in 
turn, noting that bc £ 0 since ad — bc £ 0, we obtain 
the following. 


c dad} 0 1 Ria 

‘Ro, {0 1]/¢ 0} Rap 
R,, — dRoy, c 0 —¢ 1 Rip 
1} ¢ O| Ro 


This gives the same inverse matrix as substituting a = 0 
into the inverse matrix found earlier. 


2.11 (a) Since det A = (7 x 3) — (4x5) = 
1, 3 -4 
ata [ 3 4), 
(b) Since det A = (6 x 3) — (2x 9) =0, A~* does not 
exist. 


(c) Since det A = 


2.12 (a) We have 
(B-'A~')(AB) = B7'(A7'A)B 
=B IB 
=B'B=I 


Solutions to the exercises 


and 
(AB)(B-'A~') = A(BB“')A7! = AA =I, 
hence (AB)-!=B™'A™t. 


(b) Since det A = 2, we have 


Avtt=1l 5 —2 _ : = 
2;-4 2 =) 1] 


Since det B = —2, we have 
1 _1,/-2 -4]_/ 1 2 
eral t a= [-4 -¥]- 
5 6]. 
a 
() (A+By=|5 §| 
_if 3 -6 
=Alag 5 
teat 2 
=|", 3) 
2 i 2 
| Jee | 
a a 
-|J l 
~~ |} 5 1 
2 2 
Thus (A+B)! 4A1+B"1. 
= 
a ffs 2) Ss a 
@ apy'= (i 5] [4 ]) 
_f4 4)7 
“17 6 


WIN loo DIR pe 
aes 
— 


=|" 


Thus (AB)-! = B7'A~?. 


273) A= 3 | 


w) @ ali] = 
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2.14 In each case, the area of the parallelogram is the 
magnitude of det A. (Note that in each case the paral- 
lelogram is a rectangle!) 

(a) We have det A = 2, so the area of the parallelo- 
gram is 2. 


yA (1, 1) 


dd, -l) 


(b) We have det A = —20, so the area of the parallel- 
ogram is 20. 


yA (3, 1) 


(2, —6) 


(c) We have det A = 0, so the area of the parallelogram 
is 0. 


j 
y 
(4, 2) re 

a) ; O z 
2.15 (a) |“ “l= : “|= |= a4 
|e sl-|a : 

= be — ad = —(ad — be) = —det A 
(c) ; “| =ad be = det A 
@ |o af=|e e]=9 
(e) ‘ ‘ =I | = Mad — be) = kde A 
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ka kb 
(f) is fa = Had — be) = A? det A 


(g) aan a = ad — amb — bc + bma 
=ad—bc= det A, 
ee Po 7m ad ~ med ~ be-+ made 
c d 
= ad — bc = det A. 
(h) From part (f), with k = (ad — bc)~', we have 
d b 
ad — bc ad — be | _ 1 a =) 
c a (ad — bc)? —c a 
ad — bc ad — bc 7 1 al 
~ ad—be det A’ 
2.16 
4 1 0 
2 -1 0 -1 0 2 
(a) }o 2 -1 =| [=] +0] | 
2 3 I 3 1 2 1 2 3 
= (4x 5)—(1 x 2)+0=18 
1 2 3 
5 6 4 6 4 5 
(b) |4 5 6 =1] |-2| +3 | 
78 9 8 9 7 9 7 8 
= (1 x (—3)) — (2 x (-6)) + (3 x (-3)) 
=-3+12-9=0 
1 2 3 
4 5 0 5 0 4 
te) a : ; =i\5 =|-2l9 a|+3lo ‘| 


2.17 (a) We have 


2 1 3 2 1 
0 2 1 6 
3 41 


0 1 


3.6 


=| 31 


oat ol-als offs a 
6 

= (2 x 11) — (1 x (—3)) + (3 x (-6)) 

= 22+3-18=7. 
(b) (i) We obtain —7 (interchanging two rows). 
(ii) We obtain 7 (taking the transpose). 
(iii) We obtain —21 (interchanging two rows, as in (i), 
then multiplying column 2 by 3). 


(c) Expanding by the first column, we have 


1 -1l 3.2 3 2 
det A=1), 3|-9(3 s|+(3 il=s. 
We obtain 
1. <3 2 
U=,0 1 -1), 
0 O 5 
then 
1 -1 0 —-1 0 1 
act = 114 5|-3/0 5|+2/0 0|=5 


The product of the diagonal elements of U is also 
1x1lx5=5. 


2.18 We have 
1 -4 2 


ee | 


3.3 
8 9 


= —-124+12+4 20 = 20. 


1 
Also, detU=|0 10 -—3]=20=detA. 
0 


i) 


1 
(b) bxc=/|1 2 3 
6 


i) 
w 


= —Ti+ 14j— 7k 


2 


ePNMmr 
ar Ww 
II 
“I 


1 1 
2.22 The area is +/det |3 1 
ls 2 


l| 


g|-5 +545] = 2.5. 


2.23 AB 


1 
2 
1 -l 0 
3 
5 


OL dt cd 


so we see that AB is not symmetric. 


or FR 


FOr 


i) 


Solutions to the exercises 


35 O 
(AB)F=/2 1 1], 
ae ea 
o£. 01 ft 2 4 
BTAT=BA=]1 0 1]/2 3 -1 
Gt at) Ie eeb gD 
3 5 O 
=/2 1 1], 
a ee 


so (AB)? = B7 A’. 


1 
Putting x = [x y|? and b=[1 —1]", we need to 
solve Ax = b. Multiplying both sides on the left by 
A7~', we obtain A7'Ax = A~'b. This simplifies to 


evil LH G) 


sox=landy=0. 


2.24 det A = 1—(-2) =3 and A™* = } E Sil 


Section 3 


3.1 Suppose that the line is y = a9 + a12, so the inter- 
polation equations are 

ag + 0.6a, = 1.856, 

do + 0.8a1 = 2.3472. 
The solution of this pair of equations is aj = 0.3824 and 
a, = 2.456, so the equation of the interpolating straight 
line is 

y = 0.3824 + 2.4562. 
Thus y(0.65) = 1.9788 and y(0.47) = 1.536 72. 


3.2 Suppose that the quadratic is y = aj +. a,x + ao2?, 
so the interpolation equations are 

ao 0.2a, (0.2)? a, = 1.2, 

ao 0.4a4 (0.4)? a2 = 1.48, 

ao 0.6a1 (0.6)a2 = 1.856. 
The augmented matrix form of these equations is 


1 0.2 0.04 | 1.2 
Xly=]1 04 0.16 | 1.48 
1 0.6 0.36 | 1.856 


After Stage 1 of the Gaussian elimination method, we 
have 


1 0.2 0.04} 1.2 
Ulc=]0 0.2 0.12 | 0.28 
0 O 0.08 | 0.096 
Back substitution gives a= [1.016 0.68 1.2]”, so the 
equation of the interpolating quadratic is 
y = 1.016 + 0.682 + 1.22. 
This gives y(0.47) = 1.60068. This estimate is very 
close to the value 1.598496 obtained in Example 3.2. 
However, since the three points closest to « = 0.47 are 
0.2, 0.4 and 0.6, the value 1.60068 would be regarded 
as the most reliable of the four estimates that we have 


so far computed, in Examples 3.1 and 3.2, and Exer- 
cises 3.1 and 3.2. 
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3.3 The required matrices are 


1 1 2 
_ |i 2 _ | ao _ |2 
AS ly 93 a= ||, ne 
4 
We compute 
1 1 
Ty _}i 1 1 1 1 2] |4 8 
eS a eee 1 2} 4/8 18)’ 
1 3 


2 
1 1 1 1} ]2] _ 410 

2} | 22)" 

4 
The solution of the system of linear equations 


(X7X)a=X7y isa=([0.5 1]7, so the least squares 
straight line for the given data points is 


y=0.54+ a. 


3.4 The required matrices are 


{. =f 4 
Kai Gi, asl? y=| 0 
t A “ 1 


We compute 


i 4 
x7x=|_j ; dl 1 0 =(j tp 
1 


7 
ei) 2 t,o _ fo 
y= [4 oa] f= [2 


The solution of the system of linear equations 
(X7X)a=X"y is a=[0 1)", so the least squares 
straight line for the given data points is 


eR 


y=. 
This result might have been expected, since all the data 
points lie on the line y = a. 


3.5 The required matrices are 


11 1 
_|1 2 _ [as _|4 
et hee a= |%], 2) 
14 16 
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We compute 


11 
fe ft 1 1) ae] _ | 410 
hele a alla 3 l= lap sof 

1 4 

1 
r._ [1 1 1 1]] 4] _f 30 
ee 3 |= [a0]: 

16 


The solution of the system of linear equations 
(X7X)a = X7y is a=[-5 5]", so the least squares 
straight line for the given data points is 

y=—5+52. 


This solution is not particularly good, since the values 
it gives at the data points are 0, 5, 10 and 15, rather 
than 1, 4, 9 and 16, respectively. However, a straight 
line is not really appropriate for these data, which lie 
on the quadratic y = x7; see the figure below. 


yA 

207 y=-5+5x 

107 

0 T T T . 
12 3 4 


3.6 (a) For the least squares straight line, we have 


1 sal 0 
_|1 0 . ad _ fag 
ere a fie als a= [2], 
i 2 1 


We compute 
4 2 0 
KxS E | . eye | 
The solution of the system of linear equations 
(X7X)a = X7y is a=[-0.2 0.4], so the least 
squares straight line is 
y = —0.24+ 0.42. 


The deviation vector d is given by 


1-1 
1 0) (08). }=1 
ae Ey 0 
1 2 
~0.6 
_ | 08 
fe ste 
~0.4 


so the least squares deviation is déd=122. 


(b) For the interpolating cubic polynomial, the re- 
quired matrices are 


1 -1 1 -1 0 ao 
—}|1 00 0 _|-l — | a 
A=lp a1 a)? Y=!) ol* 8=le 
1 24 8 ii ie 
The solution to Xa=y isa=[-1 4 1 —§]?, so 
the interpolating cubic polynomial is 
y= 1+ 0+" ia. 
Section 4 


4.1 The change in x is 

6x =xX—x=[0.04 0.03 —0.05]7, 
so the absolute change is 

||Ox|| = 0.05. 


4.2 The row sums of the magnitudes of the elements 
of A~* are 20, 74 and 59. Hence the absolute condition 
number is ky = 74. 


4.3 (a) The inverse matrix is 


The problem is absolutely well-conditioned, since 
k, = 1. (The solution is 


~~ 4)-E 


though this was not asked for.) 


(b) The inverse matrix is 
-1_ |-35 25 
AS | 50 | , 


The problem is absolutely ill-conditioned, since ka = 85. 
(The solution is 


0 [2]-[] 


though this was not asked for.) 


Solutions to the exercises 
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Study guide for Unit 10 


This unit continues the study of linear algebra begun in Unit 9. It assumes 
knowledge of: 

e matrices to represent linear transformations of points in the plane; 

e the solution of systems of linear equations in two and three variables; 

e the expansion of 2 x 2 and 3 x 3 determinants. 

From the point of view of later studies, Sections 2 and 3 contain the most 
important material. 


Section 1 sets the scene and provides examples for later sections. In Subsec- 
tion 1.2 you will see examples that give some indication of the importance 
of matrices, particularly eigenvalues, in applied mathematics. 


Sections 2 and 3 contain the basic algebraic techniques that you need to 
understand before proceeding to Unit 11. 


In Section 3 we shall need to expand 3 x 3 determinants and solve cubic 
equations. For all the cubic equations that we ask you to solve by hand, one 
root will be easy to find, so you have to solve only a quadratic equation. For 
more general problems, use the computer algebra package for the course. 


Section 4 is devoted to numerical methods for finding eigenvalues and eigen- 
vectors, often used in real applications. These are important ideas, but we 
do not extend them further in this course. 


The material in Sections 1-3 will be needed later in the course, particularly 


in Unit 11. The eigenvalues and eigenvectors that you calculate in Sections 2 
and 3 will be used throughout Unit 11. 
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Unit 10 Eigenvalues and eigenvectors 
Introduction 


Consider the following simplified migration problem. 


The towns Exton and Wyeville have a regular interchange of population: 

each year, one-tenth of Exton’s population migrates to Wyeville, while one- Exton 
fifth of Wyeville’s population migrates to Exton (see Figure 0.1). Other 1 
changes in population, such as births, deaths and other migrations, cancel 

each other and so can be ignored. If x, and y, denote, respectively, the Wyeville 
populations of Exton and Wyeville at the beginning of year n, then the 

corresponding populations at the beginning of year n + 1 are given by Figure 0.1 


{ Lnt1 = 0.92 + 0.2yn, 
Yn+1 = 0.18%, + 0.8Yyn, 


or, in matrix form, 


ban : i | ia A matrix such as fe a 
Yn+1 0.1 0.8 Yn is called a transition matriz. 


This is an example of an iterative process, in which the values associated The entries in such a matrix 


with the (n+ 1)th iterate can be determined from the values associated are all non-negative, and the 
arth tae oth senate: entries in each column sum 


to 1. 
Suppose that initially the population of Exton is 10000 and that of Wyeville 
is 8000, i.e. x9 = 10000 and yo = 8000. Then after one year the populations 
are given by 


x1) _ |0.9 0.2} |10000] _ | 10600 x, = 10600, 

yi| {01 0.8 8000} | 7400}’ yi = 7400. 
and after two years they are given by 

toa} _ |0.9 0.2} | 10600} — | 11020]. x2 = 11020, 

yo| {01 0.8 7400} =| 6980} y2 = 6980. 


we can continue this process as far as we wish. 


What happens in the long term? Do the populations eventually stabilize? 
Using the computer algebra package for the course, we can verify that the 
populations after 30 years are x39 = 12000 and y39 = 6000. It follows that x39 = 12000, 


after 31 years the populations are given by y30 = 6000. 
z31| _ |0.9 0.2} | 12000 £31 = 12000, 
y3i| {0.1 0.8} | 6000 ys1 = 6000. 


_ | 0.9 x 12000 + 0.2 x 6000] — | 12000 
~ | 0.1 x 12000 + 0.8 x 6000) =| +6000} ° 


So, if x39 = 12000 and y39 = 6000, then x3; = x39 and y31 = y39 and the 
sizes of the populations of the two towns do not change. Moreover, the sizes 
of the populations will not change in any subsequent year. 


There are situations, such as the above migration problem, where a particu- 
lar non-zero vector does not change under a linear transformation. However, 
this is more the exception than the rule. It is more useful to investigate vec- 
tors that are transformed to scalar multiples of themselves — geometrically 
this means that each such vector is transformed into another vector in the 
same or the opposite direction. Such vectors are called eigenvectors, and 
the corresponding scalar multipliers are called eigenvalues. 


Section 1 considers the eigenvectors and eigenvalues associated with vari- 
ous linear transformations of the plane. We also outline situations where 
eigenvectors and eigenvalues are useful. 
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In many problems, such as in the above migration problem, the appropri- 
ate linear transformation is given in matrix form. In Sections 2 and 3 we 
investigate the various types of eigenvalue that can arise for 2 x 2 and 3 x 3 
matrices. 


In the migration problem, we iterated to obtain the ‘steady-state’ popula- 
tion. Section 4 follows a similar method in order to calculate the eigenvalues 
and eigenvectors of a matrix. Although our discussion is mainly about 2 x 2 
and 3 x 3 matrices, many of the ideas can be extended to the much larger 
matrices that often arise from practical applications. 


1 Introducing eigenvectors 


We first investigate the eigenvectors and eigenvalues that arise from various 
linear transformations of the plane — in particular, scaling, reflection and 
rotation. Do not worry about how we construct the transformation matrices 
— that is not important. It is the geometric properties of these matrices, and 
the relevance of these properties to eigenvectors and eigenvalues, that are 
important here. Later in this section we outline another use of eigenvectors 
and eigenvalues. 


1.1 Eigenvectors in the plane 


Consider the linear transformation of the plane specified by the matrix 
3 2 
A= ki | | 
Using matrix multiplication, we can find the image under this linear trans- 
formation of any given vector in the plane. For example, to find the image 


3 2 2 8 
an T = = 
of the vector v = [2 1], we calculate Av = i | ] = Fe 


quired image is [8 6)", as shown in Figure 1.1. In general, to find the image 
under this linear transformation of the vector v = [x y]", we calculate 


— {3 2) }a} _ | 3x+2y 
welt af] = [ta 


so the required image is [3a +2y «x + 4y]’. 


| , so the re- 


Exercise 1.1 


Find the image of each of the vectors 


w-[ 2], x= [2]: 9=[2]. == [2] 


under the transformation matrix A = E ‘| . 


In Exercise 1.1 you saw that the image ofx = [1 1] is[5 5]. This image 
is a vector in the same direction as x, but with five times the magnitude, as 
shown in Figure 1.2. In symbols, we write this transformation as Ax = 5x. 


Section 1 Introducing eigenvectors 


Here ‘steady-state’ refers to 
the fact that the populations 
no longer change. 


YA 
(8, 6) 
(2, 1) 
0 x 
Figure 1.1 
v4 (5,5) 
(1,1) 
0! x 
Figure 1.2 
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In fact, as you can easily check, any vector in the same direction as x = 
[1 1)” (or in the direction opposite to x) is transformed into another vector 
in the same (or opposite) direction, but with five times the magnitude, 
as shown in Figure 1.3. Non-zero vectors that are transformed to scalar 
multiples of themselves are called eigenvectors; the corresponding scalar 
multipliers are called eigenvalues. For the above 2 x 2 matrix A, [1 1)” is 
an eigenvector, with corresponding eigenvalue 5, since 


Et a) J=[3)-e[:) 
Similarly, since (from Exercise 1.1) 
Er a) = [2] =2 [4], 


we can see that [—2 1 is also an eigenvector, with corresponding eigen- 
value 2. 


Definitions 


Let A be any square matrix. A non-zero vector v is an eigenvector 
of A if Av = Av for some number A. The number J is the corresponding 
eigenvalue. 


Any non-zero multiple of an eigenvector is also an eigenvector with the same 

3 2 
1 4 
each vector [k k]” is an eigenvector corresponding to the eigenvalue 5, 


eigenvalue. For example, if A = , then for any non-zero number k, 


k 


because ki A i =5 i ; also, each vector [—2k k]” is an eigenvector 
corresponding to the eigenvalue 2, because | | [ —2 [7 : 


1 4 k; k; 


We defined linear dependence for the rows of a matrix in Unit 9. The concept 
may be extended to any collection of vectors as follows. 


Definitions 


The vectors V1, V2,.--,Vn are linearly dependent if a non-zero linear 
combination of these vectors is equal to the zero vector, i.e. 


gV1 + gov2 +++-+GnVn = 0 
where the numbers qi, q2,---;@n are not all zero. 


Vectors that are not linearly dependent are linearly independent. 


The nature of the eigenvectors of a matrix is closely bound to the notion 
of linear dependence. Two (non-zero) vectors are linearly dependent if, and 
only if, one is a multiple of the other. Thus the eigenvectors [1 1]’ and 
[2 2)” are linearly dependent, as are all the eigenvectors of the form [kk] 
corresponding to the eigenvalue 5 of the matrix A above. Similarly, all the 
eigenvectors of the form [-2k k]’ corresponding to the eigenvalue 2 of A 
are linearly dependent. However, the eigenvectors [1 1] and [—2 1)” are 
linearly independent. 


The following exercise illustrates a property of linearly independent vectors 
that will prove useful later. 
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< 


(k, k) 


(5k, 5k) 


(51,51) 


UD 


Figure 1.3 


Vv 


This is a consequence of the 
fact that A(kv) = k(Av) 
= k(Av) = \(kv) for any 


scalar k. 


If 


qvi + gv2 = 0, 


and q; #0, then 


Vi= 


q2 
——v 
1 


2- 


Similarly, if gg 4 0, then 


v2= 


1 
——v 
q2 


1: 


*Exercise 1.2 


The vectors vi = [1 1]? and v2 = [-2. 1” are linearly independent. Show 
that for any vector v= [x yj], there are numbers a and ( such that 
v = av, + Gv2. (In other words, show that any two-dimensional vector v 
can be written as a linear combination of v1 and v2.) 


The result in Exercise 1.2 generalizes to any two linearly independent vec- 
tors. That is, if vj; and vo are any pair of linearly independent two- 
dimensional vectors and v is any two-dimensional vector, then we can find 
numbers a and (7 such that v = avy + Bvo. 


Returning to eigenvectors, we require an eigenvector to be non-zero. This 
is because AO = O for every square matrix A, so 0 would otherwise be an 
eigenvector for all square matrices. Thus there is no point in including 0 as 
an eigenvector. However, it is possible for an eigenvalue to be 0. For exam- 


F 2 1 1 2 1 1 
ple, if we choose A = | 7 | ond v=| 9], then Av= | 7 | |_3|- 


H =0 |_| , so [1 —2] is an eigenvector of A with eigenvalue 0. 


*Exercise 1.3 


In each of the following cases, verify that v is an eigenvector of A, and write 
down the corresponding eigenvalue. 


waft} Bh ma-B a) Ll] 


Exercise 1.4 


Write down an eigenvector and the corresponding eigenvalue for the matrix 
0.9 ie 


associated with the migration problem in the Introduction, i.e. | 01 08 


Very occasionally it is possible to determine information about the eigen- 
vectors and eigenvalues of a given matrix from its geometric properties. This 
is so for each of the cases in Table 1.1 below (where the image of the unit 
square is used in each case to indicate the behaviour of the transformation). 


In the case of the matrix k a it is clear from the geometric proper- 


ties of the linear transformation that vectors along the coordinate axes are 
eigenvectors, as these are transformed to vectors in the same directions. 


0 —— : 
90 —1 |» wesee that reflection in the x-axis leaves 
the vector [1 0]7 unchanged, and reverses the direction of [0 1]", so these 
must be eigenvectors. 


In the case of the matrix i 


In the third case (rotation through 7 anticlockwise about the origin), we do 
not expect to find any real eigenvectors because the direction of every vector 
is changed by the linear transformation. Surprisingly, even this matrix has 
eigenvectors of a kind, namely complex ones, but we have to adopt the 
algebraic approach of Section 2 in order to find them. 


Section 1 Introducing eigenvectors 


The result generalizes further. 
If v1, V2,...,Vn are any n 
linearly independent 
n-dimensional vectors and v 
is any n-dimensional vector, 
then we can find numbers 
Q1,Q02,...,Q@,, such that 
V=Q1V1 + Q2V2 +°'++AnVn- 
We do not prove this result 
here. 


The unit square has vertices 
(0,0), (1,0), (1,1) and (0,1). 
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Table 1.1 
Matrix Comment Transformation of the unit square Ejigenvectors Eigenvalues 
A scaling by 3 a ' 
3 0 in the x-direction [1 0] 3 
0 2 and by 2 in the 1 (0 at 2 
y-direction (i.e. 
a (3,2) scaling) = ge 
0 x 0 x 
1 3 
y YA 
1 0 A reflection in | 71 [1 o]* 1 
0 -l the x-axis 1 (0 ur = 
0 = 0 | 
1 
‘ y y 
412 4k A rotation through | No real No real 
E a F anticlockwise eigenvectors eigenvalues 
V2 V2 about the origin 
0 ae 0 x 


Exercise 1.5 


1 0 


origin at an angle 7 to the z-axis. What are the eigenvectors of A and their 
corresponding eigenvalues? 


: 1 A 'e : 
The matrix A = ki | corresponds to reflection in a line through the 


(Hint: Find two lines through the origin that are transformed to themselves, 
then consider what happens to a point on each line.) 


1.2 Simultaneous differential equations 


We conclude this section by outlining another application of eigenvectors 
and eigenvalues which you will meet again in Unit 11. 


Many mathematical models give rise to systems of simultaneous differential 
equations relating two or more functions and their derivatives. A simple 
case is shown in Example 1.1; we shall discuss the relevance of eigenvectors 
and eigenvalues to these equations after we have solved them. 


Example 1.1 


Determine the general solution of the pair of differential equations 


&= 3x + 2y, 
y= x+y, 


ll) @ is dax/dt, 
y is dy/dt. 


—_~— 
Ke 
i) 

a 


where x(t) and y(t) are functions of the independent variable t. 
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Solution 


In order to solve such a system of equations, we need to find specific formulae 
for x and y in terms of t. 


From Equation (1.2) we have 


and differentiating this equation with respect to t gives 
£=y—Ay. (1.4) 


Substituting for x and & in Equation (1.1) using Equations (1.3) and (1.4), 

we obtain 7 — 4y = 3(y — 4y) + 2y, which simplifies to 
. ; This is a type of differential 
y — Ty + 10y = 0. equation that you should 


The general solution of this equation is obtained by solving the auxiliary nopOeri ce monn He 


equation \7 —7A+10=0. This has solutions \ = 5 and A = 2, so the 
required general solution for y is 


y — Ae + Be, 
where A and B are arbitrary constants. 
It follows from Equation (1.3) that 
ct=y—Ay 
= (5Ae™ + 2Be*) — 4(Ae™* + Be”) 
= Ae” — 2Be**. 
Thus the general solution of the system of equations is 


ge = Ae™ —2Be*, 
{ y= Ae'+ Be". 16) 


What is the connection between eigenvalues and eigenvectors, and the equa- 
tions of Example 1.1? The differential equations (1.1) and (1.2) can be 
written in matrix form as 


[é|=[3 3] [¢). 08) 


The 2 x 2 matrix in Equation (1.6) has eigenvectors [1 1]’ and [-2 1] 
with corresponding eigenvalues 5 and 2, respectively, as we saw in Subsec- 
tion 1.1. 


These eigenvectors and eigenvalues appear in the matrix form of the general 
solution given in Equations (1.5): 


slab }eel ae 


The first term on the right-hand side involves [1 1], an eigenvector of the 
matrix of coefficients i Al and a term e™, where 5 is the corresponding 
eigenvalue. The second term on the right-hand side is of the same form, 
but it involves the other eigenvector and corresponding eigenvalue. That 
the general solution of a system of differential equations can be written 
explicitly in terms of eigenvectors and eigenvalues is no coincidence, as we 
shall explain in Unit 11. 
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End-of-section Exercises 


*Exercise 1.6 


Show that [5 | and H are eigenvectors of the matrix F | , and find 
the corresponding eigenvalues. 

Exercise 1.7 

The matrix A = - | represents reflection in the y-axis. Write down 


two eigenvectors and the corresponding eigenvalues. 


(Hint: Find two lines through the origin that are transformed to themselves, 
then consider what happens to a point on each line.) 


Exercise 1.8 


Find a matrix A = b | for which [1 2]” is an eigenvector with cor- 


d 
responding eigenvalue 2, and [3 1]” is an eigenvector with corresponding 
eigenvalue 1. 


2 Eigenvalues and eigenvectors of 2 x 2 
matrices 


In Section 1 we considered the linear transformation of the plane specified by 


the matrix A = a 1 . We saw that [1 1]? and [—2 1)” are eigenvectors 


of A, the corresponding eigenvalues being 5 and 2, respectively. 


We did not show in Section 1 that these are the only possible eigenvalues 
of A, nor did we show you how to find the eigenvectors and corresponding 
eigenvalues for an arbitrary matrix A. In this section we use algebraic 
techniques to show you how to calculate all the eigenvalues and eigenvectors 
of any given 2 x 2 matrix A. We also investigate the three situations that 
arise when the eigenvalues are: 


e distinct real numbers; 
e one real number repeated; 
e complex numbers. 


Before we begin the discussion, we need to remind you of some results from 
Unit 9. 


Generally, a pair of equations with zero right-hand sides, such as 


ee, 


aa (2.1) 


has just one solution x = 0, y = 0. But this is not the case for every pair of 
equations of this type. For example, the equations 


x—2y=0, 
{ ~2a + 4y = 0, (2.2) 


have a solution « = 2, y= 1, and another solution x = 6, y = 3. In fact, 
x = 2k, y = k is asolution for every value of k, i.e. there is an infinite number 
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See page 58. 


For example: 
5 and 2; 
2 and 2; 


i and —i, where i? = —1. 


Section 2 Eigenvalues and eigenvectors of 2 x 2 matrices 


of solutions. This is because the equations 7 — 2y = 0 and —2x” + 4y = 0 
are essentially the same equation, since either one of the equations can be 
obtained from the other by multiplying by a suitable factor. 


Example 2.1 
Find a solution (other than x = y = 0) of the equations The second equation can be 
obtained from the first 
3z— 5y=0, equation by multiplying it 
—6x + 10y = 0. by —2. 
Solution 


There is an infinite number of solutions. For example, choosing « = 5, we The general solution is 
find (from either equation) that y = 3, which gives us a solution of the pair of 7 = 5k, y = 3k, for any 
equations. (Any other non-zero choice of x would give another solution.) mi Coustant k. 


You saw in Unit 9 that a system of linear equations Ax = b has a unique 
solution if, and only if, the matrix of coefficients A is invertible. Thus if 
the system has an infinite number of solutions, then the matrix A must be 
non-invertible. 

—2 


Thus the coefficient matrix E 4 of Equations (2.2) must be non- 


invertible, as must the coefficient matrix | 7 Pa of Example 2.1, whereas 


the coefficient matrix | | of Equations (2.1) must be invertible. Recall 


1 -l 
from Unit 9 that a matrix A is non-invertible if and only if det A = 0. You 
may like to check that the determinants of the first two matrices are zero, 
but that of the last is non-zero. 


2.1 Basic method 


We now look at a method of finding the eigenvalues and the corresponding 
eigenvectors of an arbitrary 2 x 2 matrix A. Consider the following example. 


Example 2.2 


Find the eigenvalues and corresponding eigenvectors of the matrix 
This matrix was discussed in 
A= i | Subsection 1.1. 
1 : 


Solution 
We wish to find those non-zero vectors v that satisfy the equation 


Av = Dv, (2.3) 


for some number \. Writing v = [x y]", we have 


3 2] |e x 
ft a} fo -4Le, 
i.e. x and y must satisfy the simultaneous linear equations 


3a 4+ 2y = Ax, 
a+ 4y = Ay. 


We are interested in only non-zero vectors v, so x and y are not both zero. 
These equations will have such a solution for only certain values of 4 — the 
eigenvalues of the matrix — and our first task is to find these values of X. 
The above equations are a pair of linear equations in the unknowns x and y, 
where \ is a constant that has yet to be determined. 
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Transferring the terms Ax and Ay to the left-hand side, we obtain 


(8—A)e+ 29 =, 
{ oe ee ee) 
which can be written as 
(A — AI)v = 0. (2.5) 


It is convenient to refer to Equations (2.4) as the eigenvector equations. 


Do not lose sight of the fact that we require v £ 0. In Unit 9 we saw that if 
such a non-zero solution v exists, then the matrix A — AI in Equation (2.5) 
must be non-invertible, so det(A — AI) = 0, ie. 


3—A 2 
1 4—x 


Our conclusion at this stage is that Equation (2.3) can hold for only certain 
values of A, and these values must satisfy Equation (2.6). Expanding the 
determinant in Equation (2.6) gives 


(3 —A)(4—A) —2=0, 
which simplifies to 
M—7A+ 10 =0. 
Since \? — 71+ 10 = (A—5)(A— 2), we deduce that A = 5 or \ = 2. 


| 0. (2.6) 


Thus it is only when \ = 5 or A = 2 that Equation (2.3) has a non-zero 
solution v, i.e. 5 and 2 are the only eigenvalues of A. 


In order to find the corresponding eigenvectors, we substitute each of these 
eigenvalues into Equations (2.4) and solve the resulting eigenvector equa- 
tions, as follows. 


The eigenvector equations (2.4) become —2x + 2y = 0 and 
x —y =0, and both are equivalent to the single equation y = x. It follows 
that an eigenvector corresponding to \ = 5 is [1 1)”. 


A=2| The eigenvector equations (2.4) become x+2y = 0 and 
x+2y = 0, and both are equivalent to the single equation —2y =x. It 
follows that an eigenvector corresponding to \ = 2 is [-2 1)”. 


Thus [1 1] is an eigenvector of A with corresponding eigenvalue 5, and 
[-2 1]” is an eigenvector of A with corresponding eigenvalue 2. 


In this case, the matrix A has two distinct eigenvalues, and these correspond 
to two linearly independent eigenvectors. Hi 


In Example 2.2 we found an eigenvector [1 if corresponding to the eigen- 
value 5, and an eigenvector [-2 1] corresponding to the eigenvalue 2, but 
these vectors are not unique. We could have chosen [2 2]" as the eigen- 
vector corresponding to the eigenvalue 5, and [6 —3]" as the eigenvector 
corresponding to the eigenvalue 2. However, all the eigenvectors correspond- 
ing to 5 are multiples of [1 1]”, and all the eigenvectors corresponding to 2 
are multiples of [-2 1]’. Thus, in a sense, [1 1]” and [2 2]” represent 
the ‘same’ eigenvector, as do [—2  1]” and [6 —3]?. 


*Exercise 2.1 


Use the method in Example 2.2 to find the eigenvalues and corresponding 
5 2 


eigenvectors of the matrix A = 2 5): 
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To see this, notice that 
Equation (2.3) can be 
rewritten as Av = \Iv, where 
T is the 2 x 2 identity matrix, 
which in turn can be 
rewritten as Av — AIv = 0 
and hence as (A — AI)v = 0. 


a b 
c od 


[= ad— be 


This is called the 
characteristic equation of A. 


The matrix A — AI, i.e. 


3-A 2 

1 A—AX|’ 
is non-invertible, as stated 
above, so the equivalent 
eigenvector equations are 


linearly dependent — hence 
the single equation. 


Section 2 Eigenvalues and eigenvectors of 2 x 2 matrices 


As we shall see shortly, there is a result involving the diagonal elements of 
a matrix that provides a useful check on calculations. To state this result 
succinctly, we shall need the following definition. 


Definition 


The sum of the elements on the leading diagonal of a square matrix A 
is known as the trace of A and is written as tr A. 


In Example 2.2 we found the eigenvalues of the matrix 


3 2 
a-[r 4] 
by solving the equation det(A — AI) = 0, i.e. 
3-A 2 
1 4p) 


which gives a quadratic equation in 4. 


More generally, we find the eigenvalues of the 2 x 2 matrix 


a b 
ae 
by solving the equation det(A — AI) = 0, ie. 
a—2X b 
a 


for A. Expanding this determinant gives 
(a — A)(d— A) — bc = 0, 
which simplifies to the quadratic equation 
d? — (a+ d)A + (ad — bc) = 0. (2.7) 


The roots of Equation (2.7), A; and Az say, are the eigenvalues of A, and 
we see that 


AyAg =ad—be=detA and Ay+Ag=a+d=trA. (2.8) 


Equation (2.7) is called the characteristic equation of A. It can have 
distinct real roots, a repeated real root or complex roots, depending on the 
values of a, b, c and d. We investigate these three possibilities in the next 
subsection. 


Using Example 2.2 and the above discussion as a model, we can now give 
the following procedures for determining the eigenvalues and eigenvectors of 
a given 2 x 2 matrix. 


Procedure 2.1 Eigenvalues of a 2 x 2 matrix 


Let A = b al: To find the eigenvalues of A, write down the char- 
acteristic equation det(A — AI) = 0. Expand this as 

a—A 6 ae). _ 

: d—) = \“ —(a+d)A+ (ad — be) = 0. (2.9) 


Solve this quadratic equation for X. 


Equation (2.7) can be written 
in the form 


(A — A1)(A— Ao) = 0, 
so 

dN? — (Ar + AA + ALA? = 0, 
therefore 

Ay A2 =ad— be, 

Ay + AQ =a + d. 


Using Equations (2.8) to 
determine the characteristic 
equation leads to the same 
quadratic equation. 
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Procedure 2.2. Eigenvectors of a 2 x 2 matrix 


Let A = ° et To find an eigenvector corresponding to the eigen- 
value , write down the eigenvector equations 
(a—A)xo+ by = 0, 
{ cx + (d—A)y =0. 240) 


These equations reduce to a single equation of the form py = qx, with 
solution « =p, y=4q, so [p q|” is an eigenvector. Any non-zero mul- 
tiple [kp kq]" is also an eigenvector corresponding to the same eigen- 
value. 


Example 2.3 


Find the eigenvalues and corresponding eigenvectors of the matrix 


a3 
a-[2 i] 


following Procedures 2.1 and 2.2. 


Solution 
The characteristic equation is 


im 3 


: fa) =@-AG-a)-6=0, 


which simplifies to \7 — 3\ — 4 = 0. Using the formula for solving a quadratic Alternatively, you may have 


equation, we deduce that the eigenvalues are \ = 4 and \ = —1. noticed that 
3) Aa aid), 


A =A} The eigenvector equations become —2x + 3y = 0 and 2x — 3y = 0, 
which reduce to the single equation 3y = 2x (so p= 3 and q = 2 in Proce- 
dure 2.2). Thus an eigenvector corresponding to \ = 4 is [3 2)". 


A =-1]| The eigenvector equations become 3x + 3y = 0 and 2x” + 2y = 0, 
which reduce to the single equation —y = x (so p= —1 and q = 1 in Proce- 
dure 2.2). Thus an eigenvector corresponding to A= —lis[-1 1)”. 


In Example 2.3 we have two distinct real eigenvalues, and these correspond 
to two linearly independent eigenvectors. 


*Exercise 2.2 


Use Procedures 2.1 and 2.2 to find the eigenvalues and corresponding eigen- 
vectors of each of the following matrices A. 


@ a=[7 3| 


Oo Be =| 


From Equations (2.8), the sum of the eigenvalues is a+d=trA and the Similar results are true for all 
product is ad — bc = det A. It is useful to check that these properties hold, square matrices, of any size. 
whenever you have calculated the eigenvalues of a given matrix. If they do 

not hold, then you have made a mistake, which you should rectify before 

proceeding to calculate the eigenvectors. 
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*Exercise 2.3 


Verify the properties in Equations (2.8) for the matrices in: 
(a) Examples 2.2 and 2.3; (b) Exercises 2.1 and 2.2. 


2.2 Three types of eigenvalue 


ae ; ; b|. . 
The characteristic equation of the matrix A = k | is a quadratic equa- 
tion, and this has distinct real, repeated real or distinct complex roots, de- 
pending on whether the discriminant of the quadratic equation is positive, 
zero or negative. We illustrate each of these three cases below. 
Distinct real eigenvalues 
In the following exercises, the eigenvalues are real and distinct. 


*Exercise 2.4 


Calculate the eigenvalues and corresponding eigenvectors of the matrix 


0 
tained by geometric considerations in Section 1 (see Table 1.1, page 60). 


A= ki | with p~q. Check that your answers agree with those ob- 


Exercise 2.5 


Calculate the eigenvalues and corresponding eigenvectors of the matrix 


0.9 0.2 
— fe 7 


Notice that whenever we have two distinct real eigenvalues, we also have 
two linearly independent eigenvectors. 


Repeated eigenvalue 


In the following exercise, the eigenvalue is repeated. 


*Exercise 2.6 


Calculate the eigenvalues and corresponding eigenvectors of the following 
matrices. 


(a) an where a # 0. (b) a3 , wherea #0. 
0 a 0 a 


Complex eigenvalues 


For 2 x 2 matrices, complex eigenvalues arise when there are no fixed di- 
rections under the corresponding linear transformation as, for example, in 
an anticlockwise rotation through ] about the origin (see Table 1.1 and 
the discussion beneath the table). They can occur for matrices of all sizes, 
but for real matrices they always occur in complex conjugate pairs (i.e. if 
A, = a+ ib is an eigenvalue, then 2 = a — ib is also an eigenvalue). The 
following example illustrates one of the simplest cases. 


See Equation (2.9). 


This matrix is associated with 
the migration problem 
discussed in the Introduction. 


This is true in general, though 
we do not prove it here. 


Although we discuss complex 
eigenvalues, in this course the 
elements of a matrix A will 
always be real, so that A is a 
real matrix. 
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Example 2.4 
Calculate the eigenvalues and corresponding eigenvectors of the matrix 
0 -1 
a= [2 2). 
Solution 


The characteristic equation is \2-+ 1=0. This equation has no real solu- 
tions, so there are no real eigenvalues. However, it has complex solutions 


A=i and A= -i, where i? =-1. 
Thus the matrix A has the complex eigenvalues 7 and —17. 


The eigenvector equations are —Ax — y = 0 and x — Ay = 0 (using Equa- 
tions (2.10)). 


A =i| The eigenvector equations become —ix —y = 0 and x — iy = 0, 
which reduce to the single equation y = —ix (so p= 1 and q = —i in Pro- 
cedure 2.2). Thus an eigenvector corresponding to the eigenvalue \ = i is 
fl —a?. 


A =-i| The eigenvector equations become ix —y =0 and r+ iy = 0, 
which reduce to the single equation y = iz (so p= 1 and q = 1 in Proce- 
dure 2.2). Thus an eigenvector corresponding to the eigenvalue \ = —i is 
(1 iJ7. of 


In Example 2.4 the eigenvectors are complex, but, nevertheless, are linearly 
independent. It remains true that any two eigenvectors corresponding to 
distinct eigenvalues, whether real or complex, are linearly independent. 
*Exercise 2.7 


Calculate the eigenvalues and corresponding eigenvectors of the matrix 


ar 


2.3 Some results on eigenvalues 


In this subsection we list some general results that will be needed later. 
Although we introduce them in the context of 2 x 2 matrices, they hold for 
square matrices of any size. 


We first consider the eigenvalues of various types of matrix: triangular, 
symmetric and non-invertible. 


Triangular matrices 


A matrix is triangular if all the entries above (or below) the leading diagonal 
are 0. Thus a 2 x 2 triangular matrix has one of the forms 


lo al: eal [a al: 


(upper triangular) (lower triangular) (diagonal) 
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Check: 
sum = tr A = 0; 
product = det A = 1. 


The second equation is 7 
times the first. 


The second equation is —7 
times the first. 


See Unit 9. 
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The above triangular matrices all have characteristic equation 
(a—A)(d— ) =0, 
and eigenvalues A = a and 4 = d. 


Thus the eigenvalues of a triangular matrix are the diagonal entries. 


*Exercise 2.8 


Find the eigenvalues and corresponding eigenvectors of the upper triangular 


ex 1 3 
matrix | 5 |: 


Symmetric matrices 


A matrix is symmetric if it is equal to its transpose — that is, if the entries See Unit 9. 
are symmetric about the leading diagonal. Thus a 2 x 2 symmetric matrix 
has the form 


a b 
a-( 3) 
It has characteristic equation 
7 — (a+ d)A + (ad — 67) = 0, 


and eigenvalues 
In this course A is always a 
N= 5 (a +d+/(a+ d)? —4(ad— a) ‘ real matrix, so a, b and d are 


real numbers. 


The discriminant is 
(a + d)* — 4(ad — b*) = (a? + 2ad + d*) — 4ad + 4b? = (a — d)? + 4b’, 
which is the sum of two squares and therefore cannot be negative. 


It follows that the eigenvalues of a symmetric matrix are real. 
Exercise 2.9 


. . : b 
Under what circumstances can a symmetric matrix k ;| have a repeated 
eigenvalue? 


Non-invertible matrices 


A matrix is non-invertible if and only if its determinant is 0. Thus a2 x2 See Unit 9. 
non-invertible matrix has the form 


A= i | where ad — bc = 0. 


However, we know that if A; and A» are the eigenvalues of A, then 
AyA2g = det A = 0. See Equations (2.8). 


It follows that a matrix is non-invertible if and only if at least one of its 
eigenvalues is 0. Also, a matrix is invertible if and only if all its eigenvalues 
are non-zero. 


2 1 


ae ; Since A; + Ag = tr A = 4, the 
of A, as we saw on page 59 (below Exercise 1.2). other eigenvalue is 4. 


For example, if A = | |: then det A = 0 and hence 0 is an eigenvalue 
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Non-invertible matrices 

A square matrix A is non-invertible if and only if any of the following 
equivalent conditions holds: 

e its determinant is zero; 

e its rows are linearly dependent; 


e its columns are linearly dependent; The third condition follows 
immediately from the second, 


e the equation Ax = 0 has a non-zero solution; ; “ 
since det A = det(A° ). 


e at least one of its eigenvalues is zero. 


Eigenvalues —- summary 


The eigenvalues of a triangular matrix are the diagonal entries. 
The eigenvalues of a real symmetric matrix are real. 
The sum of the eigenvalues of A is tr A. 


The product of the eigenvalues of A is det A. 


Exercise 2.10 
Without solving the characteristic equation, what can you say about the 
eigenvalues of each of the following matrices? 


oa-[E 3] ma-[% 3] oa-[8 8 


2.4 Eigenvalues of related matrices 


For the rest of this section, we compare the eigenvalues of a matrix A with 
the eigenvalues of some related matrices. The results of our investigations 
are needed in Section 4. The following exercise leads our discussion. 


Exercise 2.11 
3 2 
1 4 


A corresponding to the eigenvalue 5, and that [-2 1]” is an eigenvector 
corresponding to the eigenvalue 2. 


Let A = | . In Example 2.2 you saw that [1 1] is an eigenvector of 


(a) Evaluate the following matrices. In each case, solve the characteristic 
equation, determine the eigenvalues, and compare these eigenvalues with 
the eigenvalues of A. 


(i) A? (ii) ATt sii) A+2T (iv) (A—4I)-! ~— (v) 3A 
(b) Verify that the given eigenvectors of A are also eigenvectors of the 
matrices in part (a). 


Exercise 2.11 illustrates some general results. 
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If v is an eigenvector of any matrix A with eigenvalue A, then Av = Av. 
It follows that 


A’v = A(Av) = A(Av) = A(Av) = A(Av) = A?v, 
which shows that v is an eigenvector of A’, with eigenvalue 7. 


More generally, if v is an eigenvector of a matrix A with eigenvalue \, 
then v is also an eigenvector of A* (for any positive integer k), with 
eigenvalue bs 


If v is an eigenvector of any matrix A with eigenvalue \, then Av = Av. 
It follows that (provided that A is invertible, so A~! exists) 


A> (Av) =A“ Ow) = AAMy, 


so v= AA7!v. Dividing by \ (which cannot be zero since A is invert- 
ible) gives 


Av = (1/))w, 
which shows that v is an eigenvector of A~/, with eigenvalue 1/). 


If v is an eigenvector of any matrix A with eigenvalue \, then Av = Av. 
If p is any number, it follows that 


(A — pl)v = Av — plv = \v — pv = (A - p)v, 
which shows that v is an eigenvector of A — pI, with eigenvalue \ — p. 


If the number p is not an eigenvalue of A, then the eigenvalues \ — p of 
A — pl are non-zero, so A — pl is invertible. Therefore we can multiply 
both sides of the equation 


(A — pl)v = (A—p)v 

on the left by (A — pI)~' to obtain 
v = (A—pl)!(\- p)v, 

and, dividing by A — p, 
(A= pl)-'v=Q=p)-v. 


This shows that v is also an eigenvector of (A — pI)~', with eigenvalue 
(A—p)7?. 

If v is an eigenvector of any matrix A with eigenvalue \, then Av = Av. 
It follows that for any number p, p(Av) = pAv, so (pA)v = (pA)v, which 
shows that v is an eigenvector of pA with eigenvalue pi. 


Eigenvalues and eigenvectors of associated matrices 


If A is an arbitrary matrix and J is one of its eigenvalues, then: 


In each case, an eigenvector of the associated matrix is also an eigen- 
vector of A. 


M is an eigenvalue of A” for any positive integer k; 

if A is invertible, then A~! is an eigenvalue of A7!; 

A —p is an eigenvalue of A — pI, for any number p; 

(\ — p)~* is an eigenvalue of (A — pI)~!, for any number p that is 
not an eigenvalue of A; 

pA is an eigenvalue of pA for any number p. 


We 


shall need these results in Section 4 and in later units of the course. 
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*Exercise 2.12 


(a) The eigenvalues of A = E 1 


values of each of the following. 
(i) A? (ii) A7+ (iti) A-6I (iv) (A +31)-! 


(b) What can you say about the eigenvalues of A — 41? What can you say 
about the inverse of A — 4I? 


are 4 and —1. Write down the eigen- 


End-of-section Exercises 


Exercise 2.13 


(a) Find the eigenvalues and corresponding eigenvectors of the matrix 


1 2 . ; ; 
A= F 7 ak Write down the eigenvalues and corresponding eigen- 
vectors of the matrix A!®. 
(b) Write down the eigenvalues and corresponding eigenvectors of the 
matrix ’ - 
a 3 _9|° 
(Hint: Compare this matrix with the matrix A in part (a).) 


Exercise 2.14 


Let 6 be an angle that is not an integer multiple of 7. Calculate the eigen- 
values and eigenvectors of the matrix 


cos@ —sin@ 
= be ere 


which represents an anticlockwise rotation through the angle 0. 


3 Eigenvalues and eigenvectors of 3 x 3 
matrices 


In this section we extend the ideas of Section 2 to deal with 3 x 3 matrices. 
In fact it is possible to extend the treatment to n x n matrices, and with 
this in mind it is convenient to use the notation x1, x2 and x3 (rather than 
x, y and z). 


So we are interested in finding the eigenvalues and corresponding eigenvec- 
tors of a matrix such as 


—2 1 0 
1 —2 1 
0 1 —2 


As in the case of 2 x 2 matrices, we can verify that a given vector is an 
eigenvector. For example, [1 0 —1|" is an eigenvector of the above matrix, 
since 


=o i 4 1 =, 1 
i <2 f Oo}=]| o| =-2] oO}, 
OO 2 22) bei 2 =i 


and the corresponding eigenvalue is —2. 
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Exercise 3.1 


0 0 5 0 0 
Verify that | 1} and 1] are eigenvectors of A= |1 2 11], and find 
1 —1 1 1 2 


the corresponding eigenvalues. 


The above exercise should have convinced you that it is easy to verify that 
a given vector is an eigenvector, but how are we to find such eigenvectors? 
Moreover, how are we to deal with the more general case of ann x n matrix? 
This section examines these questions. 


3.1 Basic method 


In this subsection we look at a method for finding the eigenvalues and the 
corresponding eigenvectors of an arbitrary 3 x 3 matrix. Consider the fol- 
lowing example. 

Example 3.1 

Find the eigenvalues and corresponding eigenvectors of the matrix 


3.2 2 
A= ]2 2 0 
2 0 4 


Solution 


We wish to find those non-zero vectors v that satisfy the equation 


Av =v 
for some number \. Writing v = [71 x2 2x3)", we have 
3.2 2 oa T1 
2 2 O} lee) =A | veils 
2 0 4 v3 X3 
which gives 
(3 —A)ay + 2x2 + 2x3 = 0, 
2x, + (2—A)x2 = 0, (3.1) These are the eigenvector 
224 + (4—A)a3 = 0, equations. 


which can be written as (A — AI)v =0. We are interested in non-zero 
solutions of these equations (i.e. solutions in which x1, x2 and x3 are not 
all 0). The condition for such solutions to exist is that the determinant of 
the coefficient matrix A — AI is 0, i.e. 


3-A 2 2 
2 2-A 0 = 0. 
2 0 4—x 
Expanding this determinant by the first row gives See Unit 9. 
2-A 0 2 0 2 2-2 
Pema ~ dea -21> 4—a| tle 0 |=o, 
hence 
(3 — A)(2 — A)(4 — A) — 2(8 — 2A) — 2(4 — 2A) = 0, 
which simplifies to the cubic equation This is the characteristic 
3 2 equation of A. Fora3 x3 
AY — 9A" + 18A = 0. matrix A, it will be a cubic 
equation. 
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We must now solve this equation. In general, it is difficult to solve a cubic 
equation algebraically, unless you use a computer algebra package. However, 
if you can spot one of the roots, then the task becomes considerably easier. 
In this case 


d= 00 418) =A? = 92418); 


so A = 0 is one of the roots. The others are obtained by solving the quadratic 
equation \? — 9\ + 18 = 0 to give the roots 3 and 6. So the eigenvalues are 
0, 3 and 6. 


In order to find the corresponding eigenvectors, we substitute each of these 
eigenvalues into the eigenvector equations (3.1) as follows. 


A =6 
—3 2 2 LY 

2 —4 0 £2 

2 0 -2] | a3 


The matrix form of the eigenvector equations (3.1) is 


We saw in Subsection 2.3 that an equation of the form Ax = 0 has a non- 
zero solution (and hence infinitely many solutions) if and only if the rows of 
A are linearly dependent. Here, this means that at least one of the following 
equations may be obtained from the other two: 


—32, + 2% + 273 = 0, 
224 = Axo 
224 


So we can find our desired solution by considering just two of the equations. 
Using the second and third, we have x; — 2x2 = 0 and x; — 73 = 0. Putting 
x3 = k and solving for x; and x2, we obtain x1 = k, x2 = 5k. Choosing k = 2 
gives an answer avoiding fractions: 71 = 2, r2 = 1, 73 = 2. This means that 
[2 1 2)” is an eigenvector corresponding to the eigenvalue \ = 6. 


Alternatively, we may solve the eigenvector equations by Gaussian elimina- 
tion. In this case, we have the following augmented matrix. 


-3 2 2)0}] Ri 
2 -4 O];0] Re 
2 O -—2);0] R3 


Stage 1(a) We reduce to zero the elements below the leading diagonal in 
column 1. 


—3 2 2/0] Ry 
Ro + 3R; 0 —§ 4 O} Roa 
R3+3Ri L 0 3 -§/0] Raa 


Stage 1(b) We reduce to zero the element below the leading diagonal in 
column 2. 


-3 2 2/0 
0 -8 3]0 
Rzat5Roa | 0 0 010 


Stage 2 Back substitution gives x9 = 5X3 and 71 = 3 (x9 +23) = 23. We 
are free to choose x3 as we please, so putting 73 = 2, we find that [2 1 ale 
is an eigenvector (as before). 


The other cases, corresponding to \ = 3 and A = 0, are dealt with in the 
same way. 
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One way of solving the 
quadratic equation is to 
notice that 

d? —9 +18 = (A—3)(A—6). 


While it may appear that we 
have three equations in the 
three unknowns 71, 22 

and x3, only two of the 
equations are distinct. You 
may recall a similar 
occurrence in Exercise 1.7(c) 
of Unit 9. 


We should check that these 
values satisfy the original 
equations. 


Gaussian elimination is a 
more cumbersome process 
than the previous method, 
but it is illuminating. At the 
final stage of the elimination 
process, the bottom row of 
the augmented matrix 
consists entirely of zeros — 
and this must be the case 
because the rows of A — AI 
are linearly dependent. 
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\ = 3] Substituting \ = 3 into Equations (3.1), the eigenvector equations 
become 229 + 2x73 = 0, 2”) — x2 = O and 22; + 73 = 0. The first and second 


equations reduce to 73 = —a@2 and a = 221. It follows that [1 2 —2]" is Choose, for example, 2, = 1, 
an eigenvector corresponding to the eigenvalue 3. | use this to calculate x2 
and V3. 


A =0} Substituting \ = 0 into Equations (3.1), the eigenvector equations 
become 321 + 2x%2 + 2x3 = 0, 241 + 2% = 0 and 27; +423 = 0. These equa- 


tions reduce to x2 = —2, and x; = —223. It follows that [-2 2 1]7 isan Choose, for example, 3 = 1, 
eigenvector corresponding to the eigenvalue 0. and use this to calculate 21 
and 29. 


In Example 3.1, we first found the eigenvalues by solving a cubic equation, 
then used these eigenvalues and the eigenvector equations to find the corre- 
sponding eigenvectors. We found the eigenvalues of the matrix 


3.2 2 
A=]2 2 0 
204 
by solving the equation 
3-—A 2 2 
2 2-2 0 = 0). 


2 0 4—x 


Since the left-hand side of this equation is det(A — AI), it can be written in 
the form det(A — AI) = 0. 


As in the case of a 2 x 2 matrix, the values of tr A and det A provide a useful 
check on the calculations. For Example 3.1, we see that tr A = 3+2+4=9 
and the sum of the eigenvalues is also 9. Also, we have 


2 0 2 0 2 2 


ce ee 2 4 2 0 


—2 +2 


|=24-10-8=0 


and the product of the eigenvalues is also 0. 


Procedure 3.1 Eigenvalues and eigenvectors of ann x n matrix 


To find the eigenvalues and eigenvectors of an n xX n matrix A: The characteristic equation of 
an n X n matrix has 


(a) solve the characteristic equation ie solutions: (some-crwiien 


det(A — AL) = 0, may be repeated). If there 
are n distinct solutions, there 
to determine the eigenvalues \1, A2,.-.,An} will be n linearly independent 


eigenvectors (but there may 
be fewer if any of the 
eigenvalues is repeated). 


(b) solve the corresponding eigenvector equations (A — ;I)v; = 0 for 
each eigenvalue \;, to find a corresponding eigenvector v;. 


This procedure extends Procedures 2.1 and 2.2 to n x n matrices. 


Exercise 3.2 


The eigenvalues of the matrix 


1 0 -l 
A=]|1 2 1 
2 2 3 


are A= 1, X= 2 and \=3. Write down the eigenvector equations, and 
determine corresponding eigenvectors. 
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*Exercise 3.3 


0 0 6 
Determine the characteristic equation of the matrix A= | 3 0 0 
0 3 0 

3 


Verify that A = 1 is an eigenvalue of A, and find a corresponding eigenvector. 
(You do not need to find any other eigenvalues.) 


3.2 Finding eigenvalues by hand 


For the matrices that arise from real applications, it is rarely possible to 
calculate the eigenvalues and eigenvectors by hand, and we would generally 
use numerical techniques such as those discussed in the next section. How- 
ever, finding eigenvalues and eigenvectors by hand is an important part of 
the learning process. In this course you can divide the exercises into two 
types: those for which it is easy to find the eigenvalues by hand, and the rest 
(for which you will probably need to use the computer algebra package for 
the course). The examples and exercises in this section have been carefully 
chosen to be of the former kind. 


Example 3.2 
Find the eigenvalues of the matrix 
5 0 0 
A=]1 2 1 
1 1 2 
Solution 


The characteristic equation is 


5-A 0 0 
1 2-A 1 = 0. 
1 1 2—X 
Expanding the determinant by the first row gives 
2-2r 1 
aa 2— | —% 


hence (5 — A)(A? — 44 +3) = 0. So5—A=0 or X7—4\4+3=0. Hence 
one solution is \ = 5. The quadratic equation \? — 44 +3 = 0 has roots 1 
and 3. Thus the eigenvalues are \=5,X\=3and=1. Hf 


Exercise 3.4 


Find the eigenvalues of each of the following matrices. 


001 10 0 
(a) A=]0 0 0 (b) A=|6 2 0 
10 0 5 4 3 


Often the most arduous part of such problems is the expansion of the de- 
terminant, but judicious interchanging of rows and/or taking the transpose 
can reduce the work, as the following example shows. 
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Example 3.3 
5 2 0 
Find the eigenvalues of the matrix A = 2 5 0 
-3 4 6 
Solution 


The characteristic equation is 


5=-A 2 0 
2 5-A 0 = 0. 
—3 4 6-—A 


To make the calculation easier, we should like the third column to be the 
first row. Taking the transpose gives 


a=A 2 —3 
2 5-A 4 = 0, 
0 0 6-—A 


and interchanging rows 1 and 3 gives 


0 0 6-—A 
2 5-A 4 = 0. 
5-A 2 —3 
Expanding the determinant by the first row gives 
2 5-—A 
@-»)|F 2 =0, 


hence (6 — )(4— (5 — A)?) = 0. SoA—6 = Oor 4— (5— A)? = O. Hence one 
solution is \ = 6. The quadratic equation 4 — (5 — A)? = 0 can be rewritten 
as (5 — A)? = 4, so 5— A = £2, and the two other solutions are \ = 7 and 
A = 3. Thus the eigenvalues are \=6, \=7andA=3. Hf 


Exercise 3.5 


8 0 —-5 
Find the eigenvalues of the matrix A= | 9 3 —6 
10 0 -7 
Exercise 3.6 
a 0 O 
Verify that the eigenvalues of the triangular matrices |d 6b OQ} and 
ade efe 
0 b f | are the diagonal entries a, b and c. 


0 0 «€ 


Exercise 3.7 
Verify that the sum of the eigenvalues is tr A for the matrices A in: 
(a) Examples 3.2 and 3.3; (b) Exercises 3.2, 3.4 and 3.5. 


In each case, write down the value of det A, and verify that this is the 
product of the eigenvalues. 


The eigenvalues of a 3 x 3 matrix can be real and distinct (as in Exercise 3.2), 
or real and repeated (as in Exercise 3.5), or one may be real and the other 
two form a complex conjugate pair — as in the following example. 


Recall from Unit 9 that 
taking the transpose does not 
alter the value of the 
determinant. 


This interchange changes the 
sign of the determinant, but 
this has no effect since we 
know that the value of the 
determinant is zero. 


You will need the eigenvalues 
of this matrix in Exercise 3.7. 


See Subsection 2.3. 


Repeated real eigenvalues 
may be repeated once, 


Ay = A2 As, 
or twice, 
Ay = A2 = 3 
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Example 3.4 

Find the eigenvalues and corresponding eigenvectors of the matrix 
1 OO 
0 O11 
0 -1 0 

Solution 


The characteristic equation is 


1-rA O 0 
0 —x 1) =0. 
0 —1 -A 
Expanding the determinant by the first row gives 
(1—a)|—} _}}=G-n07 +0 =0, 


so 1—A=0 or \27+1=0. The quadratic equation \? + 1 = 0 has roots 


\ =i and \ = —i, where i? = —1. Thus the eigenvalues are \ = 1, \ = 1 
and A = —1. 
The eigenvector equations are 
(1 = d) x1 = 0, 
=Mog st da =, 


2 — AL3 = 0. 


A=1}| The eigenvector equations become 0=0, —22+ 23 = 0 and 
—22 — 23 = 0, which reduce to the equations 72 = 0 and 73 = 0. There 
is no constraint on the value of x1, so we may choose it as we please. It 
follows that a corresponding eigenvector is [1 0 0”. 


\=1i| The eigenvector equations become (1 —7)r1 = 0, —iz2 + 23 = 0 and 
—22 —ix3 = 0, which reduce to the equations 7; = 0 and ixg = x3. It follows 
that a corresponding eigenvector is [0 1 i”. 


\=-i| The eigenvector equations become (1+ 7i)a1 = 0, ixg + 23 = 0 
and —x2 + ix3 = 0, which reduce to the equations 7; = 0 and —ix2 = x3. It 
follows that a corresponding eigenvector is [0 1 —i]7. J 


In Example 3.4 we have three distinct eigenvalues and three distinct eigen- 
vectors, but only the real eigenvalue \ = 1 gives rise to a real eigenvector. 
Exercise 3.8 


Calculate the eigenvalues and eigenvectors of the following matrices. 
1 1 


0 
(a) 1 
0 


Ee COO 


0 1 YG: 
1 0 (b) 
0 O Ah. 


2 


v2 
0 
1 

v2 


V2 
0) 
1 

J2 


We have dealt with 3 x 3 matrices for which the eigenvalues can be found 
easily, and it is reasonable to solve such problems by hand. Generally, the 
larger matrices that arise in practical problems are better dealt with in an 
entirely different fashion, as we shall see in the next section. However, our 
experience of finding the eigenvalues and eigenvectors of the simpler kinds 
of 2 x 2 and 3 x 3 matrices will not be wasted, for it will allow us to see 
what types of solution we may expect for larger matrices. 


78 


In each of the cases considered in this section, if a 3 x 3 matrix has three 
distinct eigenvalues, then the corresponding eigenvectors are linearly inde- 
pendent. More generally, any n x n matrix with n distinct eigenvalues has 
n linearly independent eigenvectors, although we shall not prove this here. 


If eigenvalues are repeated, then the situation becomes more complicated. 
We shall leave that discussion to the next unit, when we return to this topic 
in the context of solving systems of differential equations. 


End-of-section Exercise 


Exercise 3.9 


Find the eigenvalues and corresponding eigenvectors of the following matri- 
ces. 


o 7 24 02 0 
(a) |0 -3 2 (b) |-2 0 0 
0 0 4 001 
1 00 tt @ 
(c) Jo 1 2 (d) |o 1 0 
me i i 


4 Iterative methods 


Finding the eigenvalues and eigenvectors of a 3 x 3 matrix using the method 
of Section 3 can be quite a laborious process, and the calculations become 
progressively more difficult for larger matrices. 


In this section we show how we can often find approximations to real eigen- 
vectors and their corresponding eigenvalues by iteration — that is, by choos- 
ing a vector and applying the matrix repeatedly. 


4.1 Approximating eigenvectors 


In the Introduction we considered a migration problem in which the towns 
Exton and Wyeville have a regular interchange of population. We saw that 
if x, and y, denote the respective populations of Exton and Wyeville at the 
beginning of year n, then the corresponding populations at the beginning of 
year n+ 1 are given by the matrix equation 


In41 |] __ 0.9 0.2 Ln 

Ynt+1 0.1 0.8 Yn | - 
Using this equation, we saw that if the initial populations are xp) = 10000 
and yo = 8000, then the populations in successive years are 


xo | _ | 10000 z1| _ | 10600 z2| _ | 11020 
yo} | 8000}’ y1| | 7400}’ y2| | 6980}’ 


P 


As n increases, the sequence of vectors [a Yyn]* converges to the vector 
[x y|’ = [12000 6000]”, which is an eigenvector of the above 2 x 2 matrix. 


Section 4 Iterative methods 
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More generally, suppose that we wish to find the eigenvectors of a given 
matrix A, and that we have an initial estimate eg for an eigenvector. It 
may happen that eg is an eigenvector of A and Aep is in the same direction 
as €g, as in Figure 4.1. If, as usually happens, eg is not an eigenvector, then 
we calculate the vector e; = Aeg and then another vector, e2, defined by 
e) = Ae; = A’eg. Continuing in this way, we obtain a sequence of vectors 


2 3 4 
€9, e1 = Aeo, eg = A €90, €3 = A €0, €4 = A €0, «--, 


as shown in Figure 4.2, where each vector in the sequence is obtained from 
the previous one by multiplying by the matrix A. Often this simple method 
of repeatedly applying the matrix A produces a sequence of vectors that 
converges to an eigenvector. 


*Exercise 4.1 


Given 


3.2 1 
A=|i als eo= [5]. €n41 = Aen, n= 0,1,2,..., 


calculate e;, e2, e3 and ey. 


2 : E 
i ape [1 1)” is an eigen- 
vector corresponding to the eigenvalue 5, and v2 = [—2 1)" is an eigenvec- 
tor corresponding to the eigenvalue 2. We may suspect that the sequence 
€1, €2,€3,e4,... in Exercise 4.1 converges to a scalar multiple of the eigen- 
vector [1 1], but how can we be sure that it does? 


From Example 2.2 we know that for A = i 


Suppose that we express our initial vector e9 = [1 O]7 as a linear combi- 
nation of vz and vo, so that e9 = av, + Gve for some numbers a and £. 
Then 


oe BleCof4-8 1b 
ft lGl-0 


Multiplying both sides of this equation on the left by the inverse of the 
matrix on the left-hand side, we see that 


3] =3| i ‘][o]=| / 
By St | | ee |? 

3 
so a= 4, @=—4 and 


eae. 
We can now express e;,@€2,... in terms of vj and vg. Since v; and v2 are 
eigenvectors, and we know that Av; = 5v; and Avg = 2veg, we have 

e, = Aen = sAvi _ 5Av2 = $(5V1) _ $(2v2). 
Applying A repeatedly gives 

e) = A%e9 = $(5Av1) — 3(2Av2) = 3(5°v1) - 5(2°v2), 


e3 = Ae = $(5° Avi) — $(27 Avo) $(5°v1) = 1 (2? Wo), 


and so on. 
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y Aey 
€ 
0 x 
Figure 4.1 


Figure 4.2 


We can do this if there are 
two linearly independent 
eigenvectors (see Exercise 1.2 
and the text following that 
exercise). 

a and @ are known as the 
components of v; and v2 

in eo. 


A matrix formed from linearly 
independent vectors in this 
way always has an inverse. 


eg = Ae, = A’e9 


Section 4 Iterative methods 


In general, we can write 
en = A”eo = 3(5°v1) = 5(2"vo). 
For example, 
e€10 = Ae, _ 3(5'°v1) _ §(21°v2) 
~ 3255 208v,1 — 341vo 
~ 3255 208(v1 — 0.000 105v2). 
As you can see, the powers of 5 rapidly become much larger than the powers 
of 2, and for large values of n we can ignore the latter, to give the approxi- 
mation 
e, = A”e9 & $(5"v1). 
This is a scalar multiple of an eigenvector that corresponds to the eigenvalue 


of larger magnitude. 


Thus repeatedly applying A leads to an approximation of an eigenvector — 
an eigenvector corresponding to the eigenvalue of larger magnitude. 


Taking another example, you can show that the matrix 


Am | | 


has eigenvalues —3 and 2, and that the above method will give approxima- 
tions to an eigenvector corresponding to the eigenvalue —3 because powers 
of —3 eventually dominate powers of 2. 


*Exercise 4.2 
Use 


2 1 1 
A=|i at «= [5]. Cn41 = Aen, n=0,1,2,.... 


(a) Calculate e;, e2 and e3. 
(b) Given eyo = [29525 29524)”, calculate e11. 


(c) Use Procedures 2.1 and 2.2 to find the eigenvalues Ay and A», and cor- 
responding eigenvectors v, and va, of A. 


(d) Express ep as a linear combination of v; and vo. 
(e) Express e1, e2, and hence ep, as linear combinations of v1 and vo. 


(f) To which eigenvector does the sequence e,, provide an approximation? 


The above technique provides us with increasingly accurate approxima- 
tions to one of the eigenvectors of a 2 x 2 matrix. But the most signif- 
icant aspect of the method is that it is possible to extend it to matrices 
of any size. However, there are difficulties, and you should be aware of 
them before we proceed. You may have noticed in the previous exercise 
that the components of an approximation to an eigenvector may be quite 
large and, had we attempted to calculate e29, we should have found that 
€29 = [1743392201 1743392200]7. For larger values of n, e, involves 
even larger numbers. We shall see that this difficulty is easily overcome, but 
there are other difficulties. Table 4.1 shows five examples, each exhibiting a 
different problem. 
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Table 4.1 
Matrix Eigenvalues Corresponding Initial vector and 
eigenvectors nth approximation 
(a) E | 7 vi, =(|2 3)" ep = {1 ol? 
a 
og 1 —5 v2>= 2 —3)]7 en = 4(7)"v1 + 4(—5)"v2 
2 Oo 2 vi, =[1 0 ég=f1 1? 
(b) Op; =o = oe —9n n 
—2 v2=/|0 1 e, = 2 vi + (—2) vo 
o fe a 1 v= a ep = (1 olf 
c 
0.3 0.7 09 ve=(2 3]? e, = 3v1 — (0.9)"v2 
(a) 0 —-1 i vi=(L =a? e9=([1 OF 
1 0 ~i vo=[1 i? en = 4(i)"v1 + $(-i)"vo 
30 0 3 al 0 go]? e9o=|0 1 a 
(e) F 2 0 2 vo=(0 1 O]f e, =2"vo+v3 
00 1 1 vea= (0. 0 1 


In row (a) of the table, the elements of e,, become very large: for example, 


es = [6841 14949] 
and 
e190 = [146120437 204532218)". 


This may cause difficulties in the calculations. You may already suspect how 
this difficulty may be overcome. We are interested in only the directions of 
the vectors, and rescaling a vector does not change its direction. So, dividing 
both components of e;9 by 204532218, we obtain the vector [0.7144 1]? 
(to four decimal places) as the estimate of an eigenvector. 


In (b) we have a rather more fundamental problem. The eigenvalues have 
the same magnitude, so, as m increases, neither the term involving v, nor 
the term involving v2 becomes dominant, so the iteration does not converge. 


The eigenvalues in (c) are certainly not equal, but they are similar in mag- 
nitude. This means that we need to choose a very large value of n in order 
to obtain a good approximation for v; (the eigenvector corresponding to the 
eigenvalue of larger magnitude). 


In (d) we have complex eigenvalues and, as you might expect, a sequence of 
real vectors cannot converge to a complex eigenvector. 


In (e) we see that the sequence e,, converges to the eigenvector v2, when 
we might expect it to converge to v, (the eigenvector corresponding to the 
eigenvalue of largest magnitude). This is because the original estimate eo 
contains no component of v;, so the same is true of all subsequent estimates. 


Exercise 4.3 
In the migration problem first discussed in the Introduction, we have 


0.9 A ge “i 
) eo = : 


cae ot 0.8 8 000 


Eigenvectors of A are v; = [2 1)” with corresponding eigenvalue 1, and 
vo =[1 —1]" with corresponding eigenvalue 0.7. 
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The eg are normally chosen 
rather arbitrarily. 


You always generate a real 
sequence (unless you start 
with a complex eo). 


Such a difficulty is unlikely to 
arise in practice because 
rounding errors in the 
calculation of the iterates will 
normally ensure that the 
component of v, is not 
exactly zero. 


You found these eigenvectors 
in Exercise 2.5. 


Section 4 Iterative methods 


(a) Write ep as a linear combination of v; and vo. 


(b) Obtain an expression for e, = A”eg as a linear combination of v4 
and vo. 


(c) Explain what happens as n becomes large. 


4.2 Iterative techniques 


Suppose that we are given a square matrix whose eigenvalues are known to 
be real and distinct in magnitude. Though we do not know their values, 
assume that the eigenvalues are listed in in decreasing order of magnitude. 
For example, eigenvalues 5, —1 and —4 would be listed in the order 


Ay=5, AQ=—4, Azg=-—1. |5| > |—4] > |-1| 


Using the ideas of the previous subsection, together with the results at the 
end of Section 2, we can approximate all the eigenvalues and corresponding 
eigenvectors. We start with the eigenvector corresponding to the eigenvalue 
of largest magnitude, then we show how the other eigenvectors and eigen- 
values are approximated. 


Eigenvalue of largest magnitude: direct iteration 


In order to approximate an eigenvector corresponding to the eigenvalue Amax 
of largest magnitude, we use the approach that we employed at the beginning 
of the previous subsection. We start with a vector eg, and successively 
calculate the new vectors 


e, = Ae, eo = Ae}, e3 = Aeg, eles 


which is equivalent to writing e, = A”e9, n = 1,2,3,.... 


0 


3 11 47 219 
ey = 1]? eg= 7}? €3 = 39 |? e4 = 903 |° 


The main difficulty with this method is that the components of e, can 
rapidly become very large (or very small). But we can overcome this problem 
by setting a, to be the component of largest magnitude in e,,. Then dividing 
the vector e, by a, ensures that the vector e,/a@, has components that are 
less than, or equal to, one in magnitude. This process is called scaling, 
giving a scaled vector. 


For example, if A = fi 1 and e9 = | , then, as you saw in Exercise 4.1, 


For the above vectors, a; = 3, ag = 11, a3 = 47 and ay = 219, and we 
obtain the sequence of vectors 


Hee) 


(with the same directions as the sequence found in Exercise 4.1). 


The components of our final answer will now be of reasonable size. If this 
scaling process is applied only at the final stage, we may still encounter very 
large components in the intermediate calculations. This difficulty is avoided 
by applying the scaling process at each step of the calculation, as in the 
following procedure. 
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Procedure 4.1 Direct iteration 


For any square matrix A for which the eigenvalue Ajax of largest mag- 
nitude is real (and distinct in magnitude from any other eigenvalue), 
choose any vector eo. 


For n =0,1,2,...: 

(i) calculate 2,41 = Aen; 

(ii) find ap41, the component of largest magnitude of z,4+1; 

(iii) put en44 = Zn41/An41- 

For sufficiently large n, e, will be a good approximation to an eigen- 
vector corresponding to the eigenvalue of largest magnitude, provided 
that e9 has a non-zero component of the required eigenvector. If the 
sequence a, converges, then it converges to Amax- 


The final sentence of the above procedure can be deduced from the fact that 
at each stage of the calculation we have Ae, = Zni1 = Qn41en+1- Tf en 
converges to a vector e and @, converges to a number a, then, in the limit, 
we have Ae = ae, so e is an eigenvector corresponding to the eigenvalue a. 
But we know that e is an eigenvector corresponding to the eigenvalue of 
largest magnitude, so a must be this eigenvalue. 


Example 4.1 


Given A = ki 1 and eg = Fr use Procedure 4.1 to find e;, eg and e3. 


Solution 


First iteration 


oacan-( le) 
(i) a, =3 


co 8ai[]=[] 


Second iteration 


(i) 23 = Aey = | 


(ii) a3 = tt 7 
47 
73 au] i 1 | 
ili) eg = — =F | 
(iii) oa | 4 | -- 


If we were to continue the process in Example 4.1, we should find that 
the sequence of vectors e,, converges to the eigenvector [1 1]7, and the 
sequence a, converges to the corresponding eigenvalue 5. 


*Exercise 4.4 


Given A = bs | and €9 = [1 0]”, use Procedure 4.1 to calculate e1, e2 


4 8 
and e3. What can you deduce about the eigenvalues and eigenvectors of A? 
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A matrix A with eigenvalues 
3 and —3 would not qualify, 
because the eigenvalues are 
equal in magnitude. 


If Amax is complex, then its 
complex conjugate will also 
be an eigenvalue of the same 
magnitude, so the procedure 
works only when the 
eigenvalue of largest 
magnitude is real. 


The eigenvalues of A are 5 
and 2, with corresponding 
eigenvectors [1 1]? and 
[-2 1]”, respectively (see 
Example 2.2). 


Eigenvalue of smallest magnitude: inverse iteration 


In order to approximate an eigenvector corresponding to the eigenvalue of 
smallest magnitude, we adapt the above method of direct iteration. Suppose 
again that the eigenvalues of an invertible matrix A are listed in decreasing 
order of magnitude. It follows that their reciprocals must appear in increas- 
ing order of magnitude. But, by the results of Subsection 2.4, the numbers 
A! are the eigenvalues of the inverse matrix A~! with the same eigenvec- 
tors as the matrix A. So the problem of approximating the eigenvalue of 
smallest magnitude A of A is the same as that of approximating the eigen- 
value of largest magnitude \~' of A~!. It follows that repeatedly applying 
the matrix A~! produces an eigenvector corresponding to the eigenvalue of 
smallest magnitude, assuming that it is real and distinct in magnitude from 
any other eigenvalue of A. 


For any square matrix A with non-zero real eigenvalues, we find an eigen- 
vector corresponding to an eigenvalue of smallest magnitude by choosing 
a vector eg and successively calculating the vectors e1,e2,e3,... using the 
formula 


-1 
Qn41 =A en, n=0,1,2,.... 


In practice, such calculations can suffer from the same difficulties as direct 
iteration. We solve the problem of very large or very small vectors as before, 
by scaling, writing 
Anat 
Zn+1 =A~"e,, Ent+1 = x. =; w= 0,1 2y aces 
An+1 


where Q@,+1 is the component of largest magnitude of z,,1. However, there 
is a further complication — the calculation of the inverse matrix can be 
very time-consuming for large matrices. A more practical approach is based 
on solving the equations AZyj+1 = e, for 2,41 by Gaussian elimination and 
then putting en41 = Zn41/Qn41- 


Procedure 4.2 Inverse iteration 


For any invertible (square) matrix A for which the eigenvalue Ain of 
smallest magnitude is real and distinct in magnitude from any other 
eigenvalue, choose any vector ep. 
(4) Foray = 01,2... 
(i) calculate zn41 = A~ ten; 
(ii) find ap41, the component of largest magnitude of 2,41; 
(iii) put @n41 = 2n41/On41- 
(b) The above method is inefficient for large matrices due to the diffi- 
culty of calculating AW‘. In such cases, for n = 0,1,2,...: 
(i) calculate z,41 by solving the equation Az,,+1 = en; 
(ii) find ap41, the component of largest magnitude of 2,41; 
(ili) put @pc7 = 2y_41/On41- 
For sufficiently large n, e, will be a good approximation to an eigen- 
vector corresponding to the eigenvalue of smallest magnitude, provided 


that e9 has a non-zero component of the required eigenvector. If the 
sequence a, converges, then it converges to 1/Amin- 


Section 4 Iterative methods 


We are assuming that the 
eigenvalues of A are real and 
distinct in magnitude. 


If A is invertible, then A7~' 
exists and the eigenvalues are 
non-zero (see Subsection 2.3). 


The fact that A is invertible 
ensures that Amin 4 0 and 
that A~+ exists. 
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Example 4.2 


Given the matrix A = calculate A~'. Given e9 = [1 1)", use 


2 
3.4)’ 
Procedure 4.2(a) to calculate e;, eg and es. 
Solution 


We have 


4 —2 
earl 
seas |: 


First iteration 
(i) Zy= Am é5 —1 | 


| 
woe 
| 
Ww vw 
ee | 
| ne | 
aie 
[| 
II 
| es | 
OC] wle 
— 


(ii) ay = 4 
(iii) a=7 3 2 = A 


Second iteration 


(i) m=Ae =} 4 ‘| A _ 


oe we 

(ii) a2= 3 , 

wi Z2 3 3 1 
iii) eg = —-—=828 — 
w a= 2-4] 1=[ 4 


Third iteration T 
4 —2 1 75 

: eS eee _ || 2 

van $ IL-8 

(i) a=% . 


Z3 at 1 

isis 12 12 

ll) e = — = FT = 2 
( ) 3 a3 Li 7 |_ai| 


If we were to continue the process in Example 4.2, we should find that 
the sequence of vectors en, converges to the eigenvector [1 —1]? and the 
sequence a, converges to 1, corresponding to the eigenvalue 1/1 = 1. 


*Exercise 4.5 


8 5 


and e3. Hence find an approximation to the eigenvalue of A of smallest 
magnitude, and a corresponding eigenvector. 


Use Procedure 4.2(a) with A = i ;| and ep = i] to obtain e1, e2 


Specific eigenvalues: modified inverse iteration 


The previous procedures are restricted to calculating the eigenvalue of largest 
or smallest magnitude. Sometimes we do not need either of these, but rather 
we need the eigenvalue closest to a given value. The following method will 
allow us to find it. We assume that there is just one eigenvalue of A closest 
to the given value p. 


If \ is an eigenvalue of a matrix A (and p is not an eigenvalue of A), then 
(\ — p)~! is an eigenvalue of the matrix (A — pI)~!, and the corresponding 
eigenvectors are unchanged. Suppose that A; is the real eigenvalue closest 
to p. In other words, |A — p| is the smallest of all possible choices of |A — p|, 
thus 1/|A; — p| is the largest of all possible choices of 1/|\ — p|. It follows 
that repeatedly applying the matrix (A — pI)~! to a chosen vector e9 pro- 
duces an eigenvector corresponding to the eigenvalue closest to p. Thus the 
sequence e,41 = (A — pl)~te, (n = 0,1,2,...) should produce a sequence 
of approximations to the eigenvector. 
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See Subsection 2.4. 


Section 4 Iterative methods 


This method suffers from the deficiencies mentioned for inverse iteration, so 
we make similar refinements. 


Procedure 4.3 Modified inverse iteration 


Suppose that A is a square matrix for which one distinct real eigen- The procedure breaks down if 
value A; is closest to a given real number p. To find an eigenvector p is an eigenvalue. 
corresponding to the eigenvalue closest to p, choose any vector eo. 
(a) Fora =0,.1,2,..2 
(i) calculate 2,41 = (A — pl) ten; 
(ii) find an41, the component of largest magnitude of 2,41; 
GU) pil €,27 = 244 / Opa. 
(b) The above method is inefficient for large matrices due to the diffi- 
culty of calculating (A — pI)~1. In such cases, for n = 0,1,2,...: 
(i) calculate zn+41 by solving the equation (A — pI)Zn41 = en; 
(ii) find an41, the component of largest magnitude of 241; 
(ili) put @piy = 2_41/On44- 
For sufficiently large n, e, will be a good approximation to an eigenvec- 
tor corresponding to the eigenvalue closest to p, provided that eg has 


a non-zero component of the required eigenvector. If the sequence an, 
converges, then it converges to 1/(A1 — p). 


In the following exercise, much of the work has been done for you. 


*Exercise 4.6 


We wish to obtain an approximation to an eigenvector corresponding to the 


1 2 3 
eigenvalue closest to p= —1 for the matrix A= }|2 3 4 
3.4 6 
We have 
3 _1 _1 
4 > : : This inverse was found using 
(A — pI)" =(A+ID~ = | -7 - <3 the computer algebra package 
eee | 1 for the course. 
2 4 2 
Applying Procedure 4.3(a) with eg = [1 0 OJ” gives ag = 1.725 and 


€9 ~ [1 —0.141 —0.379]*. 


Calculate e2; with the components given to three decimal places, and obtain 
an estimate for the corresponding eigenvalue to three decimal places. 


Procedures 4.1, 4.2 and 4.3 can be used to find individual eigenvalues and 
eigenvectors of a matrix. If we require all the eigenvalues and/or eigenvec- 
tors, then there are more efficient methods that can be used, though we do 
not discuss them here. 


The rate of convergence of each of the methods depends on the relative 
closeness in magnitude of the other eigenvalues of A (or AW}, or (A — pI)~1) 
to the required eigenvalue. For example, in Exercise 4.6 the eigenvalues 
are 10.187, —0.420 and 0.234, to three decimal places. The direct iteration 
method applied to this problem would converge very rapidly since the largest 
eigenvalue, 10.187, is much larger in magnitude than the other two. On the 
other hand, inverse iteration would be slower, since the eigenvalues of A~! 
are 4.281, —2.379 and 0.098 (to 3 d.p.), and the second largest eigenvalue in 
magnitude, —2.379, is just over half the magnitude of the largest eigenvalue. 
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A judicious choice of p for the modified inverse iteration method can signif- 
icantly increase the rate of convergence. For example, choosing p = 0.2 for 
the matrix in Exercise 4.6 gives the eigenvalues of A — pI as 9.987, —0.620 
and 0.034, so the eigenvalues of (A — pI)~! are 29.781, —1.612 and 0.100. 
We expect that the modified inverse iteration method with this value of p 
would converge very rapidly to the eigenvalue 0.234 of the matrix A. 


Procedures 4.1 and 4.2 cannot be used to determine complex eigenvalues, 
since complex eigenvalues of real matrices occur in complex conjugate pairs, 
and a complex eigenvalue and its complex conjugate have the same magni- 
tude. Both procedures fail to find the required eigenvalue when there is a 
second distinct eigenvalue of the same magnitude. 


End-of-section Exercises 


Exercise 4.7 


In this exercise A = ke i 


14 —0.3 
(a) Given eg = [1 0)", use direct. iteration to calculate e;, e2 and e3. 


(b) Use the methods of Section 2 to find the eigenvalues and corresponding 
eigenvectors of A. 


(c) To which eigenvector would you expect the sequence e, of part (a) to 
converge? 


(d) If v is an eigenvector of A corresponding to the eigenvalue A, express 
A®v in terms of \ and v. 


(e) Designate the eigenvectors found in part (b) as v; and v2. Express e9 
in terms of vj and vg, then calculate eg. (If this seems like hard work, 
then look for an easier method.) 


(f) Find A7!, then use inverse iteration to calculate e;, e2 and e3, given 
ey = [-0.4 1]7. To which eigenvector would you expect this sequence 
to converge? 


(g) Comment on the rates of convergence for direct iteration and inverse 
iteration applied to this problem. 
Exercise 4.8 


Suppose that you wish to find all the eigenvalues of the matrix 


1 2 3 
A= ]2 3 4 
3.4 6 


Further, suppose that you have used direct iteration to find an eigenvector 
vi ~ [0.477 0.689 1)”, 

and inverse iteration to find another eigenvector 
v2 ~ [-0.102 1 —0.641]7. 


Use this information to find approximations to all three eigenvalues. 
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However, it is possible to use 
the modified inverse iteration 
method to find complex 
eigenvalues, provided that a 
complex value for p is chosen 
so that p is closer in 
magnitude to one of the 
complex eigenvalues than to 
any of the others. 


These calculations are 
intended to be done by hand 
(with the aid of a scientific 
calculator). 


Before you start this exercise, 
consider how the third 
eigenvalue can be obtained 
from the other two 
eigenvalues. Applying this 
method will save excessive 
computation. 


Section 5 Using a computer to find eigenvalues and eigenvectors 


5 Using a computer to find eigenvalues 
and eigenvectors 


In this section you are asked to use the three procedures in Section 4 on 
your computer to estimate some of the eigenvalues and eigenvectors of var- 
ious matrices. This will enable you to gain experience in the use of these 


algorithms without doing excessive hand calculation. 


The computer activities involve the following matrices. 


2.86 416 14.56 3.64 
(a) P= -0.34 3.58 11.62 0.68 
. ~ | 0.20 —0.80 —2.54 0.40 
1.82 416 14.56 2.60 
3.179 0.107 —3.310 —1.183 —0.587 
0.636 3.576 2.054 0.486 0.263 
(b) Q=|-3.308 5.588 2.591 —0.595 —2.492 
0.719 —0.401 —4.406 —0.040 —0.803 
0.600 —0.732 —0.622 —2.992 4.695 
5.27 —30.46 7.96 —11.22 
() R= 5.25 600 -621 —4.73 
m 7.20 —11.52 —16.14 —1.08 
~19.97 —34.98 19.01 3.97 
Bae 4.16 14.56 i 
0.34 2.79 11.62 0.68 
(4) S=]) p99 _0.80 —3.33 0.40 
1.82 416 14.56 1.81 


Use your computer to carry out the following activities. 


Activity 5.1 


Find the eigenvalue of largest magnitude and a corresponding eigenvector 
using Procedure 4.1 (direct iteration) for the matrices P, Q, R and S. 
Comment on the usefulness of this method applied to these matrices. 


Activity 5.2 


Find the eigenvalue of smallest magnitude and a corresponding eigenvector 
using Procedure 4.2 (inverse iteration) for the matrices P, Q, R and S. 
Comment on the usefulness of this method applied to these matrices. 


Activity 5.3 


Use Procedure 4.3 (modified inverse iteration) to find more efficiently: 
(i) the eigenvalues of largest and smallest magnitude of R; 


(ii) the eigenvalue of largest magnitude of S. 
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Outcomes 


After studying this unit you should be able to: 
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explain the meaning of the terms eigenvector, eigenvalue and character- 
istic equation; 

calculate the eigenvalues of a given 2 x 2 matrix, and find the corre- 
sponding eigenvectors; 

calculate the eigenvalues and corresponding eigenvectors of a 3 x 3 ma- 
trix, where one of the eigenvalues is ‘obvious’; 

appreciate that an n x n matrix with n distinct eigenvalues gives rise to 
n linearly independent eigenvectors; 

appreciate that the eigenvalues of a matrix may be real and distinct, 
real and repeated, or complex; 

recall that the sum of the eigenvalues of an n x n matrix A is trA 
and that the product of the eigenvalues of A is det A, and use these 
properties as a check in hand calculations; 

write down the eigenvalues of a triangular matrix; 

write down the eigenvalues of the matrices A”, A~!, A+ pl, (A — pI)~! 
and pA, given the eigenvalues of A; 

appreciate the use of direct, inverse and modified inverse iteration in 
approximating individual eigenvalues and corresponding eigenvectors of 
a square matrix; 

use iterative methods and hand calculation to estimate an eigenvalue 
and corresponding eigenvector in simple cases; 

use the computer algebra package for the course to determine the eigen- 
values and corresponding eigenvectors of a given square matrix. 


Solutions to the exercises 


Section 1 


a=? 7 
arlene! 
a(R Eh 


1.2 If[z y\? =o[1 1]7+6[-2 1)", then 
{ a—26=2, 
a+ B=y. 
Solving these equations, we obtain a = (2y+ x)/3 and 
B= (y—«)/3. 


is 2 IE)-[] 0] 


so [3 2]” is an eigenvector with eigenvalue 4. 


© [Pi Lal-[a]-<9L 4]: 


so [1 —1]” is an eigenvector with eigenvalue —1. 


oF UGG] 


so [0 6]7 is an eigenvector with eigenvalue 2. 


1.4 Since v = [12000 6000” is transformed to itself, 
this v is an eigenvector with eigenvalue 1. 

You may have noticed that there are many other eigen- 
vectors with the same eigenvalue, for example [12 6]”. 
There is another eigenvector [1 —1]?, with corre- 
sponding eigenvalue 0.7 (although we do not expect you 
to have found it). 


1.5 The eigenvectors act along the line of reflection 
y = «x and perpendicular to it, so they are the scalar 
multiples of [1 1]? and [1 —1]7. The vector [1 1]” 
is scaled by a factor of 1 by the transformation, while 
for [1 —1]7 the scale factor is —1; these scale factors 
are the corresponding eigenvalues. 


We may check our conclusion by evaluating 


e-E 


so [1 1)” corresponds to the eigenvalue 1, and 


ft of [-a}=[a]=-L a}, 


so [1 —1]” corresponds to the eigenvalue —1. 


Solutions to the exercises 


1 4 2 —10 2 
+6 {5 ][-s)=[ 7s] =-*|-3} 
so [2 —3]7 is an eigenvector with eigenvalue —5. Also, 
1 4] [2] _[14]_,[2 
9 1//3]}  |21} 43]? 
so [2 3]” is an eigenvector with eigenvalue 7. 
1.7 [0 1)” is an eigenvector of A corresponding to the 
eigenvalue 1, and [1 0] is an eigenvalue of A corre- 
sponding to the eigenvalue —1. 
1.8 We have 
a by ]1]_ 9 1 a b} {3} 1/3 
c d|}2}  ~J]2]’ c d}||j1] Jil? 
from which we obtain the systems of equations 
a+2b=2, c+2d=4, 
3a+ b=3, 3c+ d=1. 


. . . _ 4 _ 3 
Solving these equations, we obtain a= 3, b= 5, 


c=-2Zandd=4. soa | 


| 
oun ou 
ol olen 
—— 


Section 2 
2.1 The equation Av = Av becomes 


> sllsl=Als) 


Thus x and y satisfy the simultaneous equations 


{ 5a + 2y = Ax, 

2x + 5y = Ay, 

which can be rewritten as the eigenvector equations 
a 2y =0, 

2x+(5—A)y=0. 

These equations have a non-zero solution only if 
5-A 2 
2 5-2 | = 


So (5—A)(5— A) —4 = 0, i.e. 5— A = +2, so the eigen- 
values are \ = 7 and \ = 3. 
The eigenvector equations become 
—2x + 2y=0, 
2x —2y=0. 
These equations reduce to the single equation y = x, so 
an eigenvector corresponding to \= 7 is [1 1)”. 


A= 3] The eigenvector equations become 


2x + 2y = 0, 

2x + 2y = 0. 
These equations reduce to the single equation y = —az, 
so an eigenvector corresponding to A= 3 is {1 —1]”. 
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2.2 (a) The characteristic equation is 


1-A 4 
| ee 
Expanding gives (1 — A)(—2 — A) — 4 = 0, which simpli- 


fies to 7 +A—6=0. (Alternatively, you could have 
calculated a+d = -—-—1 and ad — bc = —6, and so ob- 
tained this equation directly.) So the eigenvalues are 
\ = 2 and A= —3. 

The eigenvector equations are 


oe dy =0, 
x+(-—2—A)y=0. 


A=2] The eigenvector equations become 
—x+4y=0 and x«—4y=0, 


which reduce to the single equation 4y = x. So an eigen- 
vector corresponding to \ = 2 is [4 1)”. 


A=-3] The eigenvector equations become 
4xn+4y=0 and x«+y=0, 


which reduce to the single equation y= —a. So an 
eigenvector corresponding to A= —3 is [1 —1]”. 


(b) The characteristic equation is 
8—-A —-5 
10 —7-A 
Expanding this gives (8 — A)(—7 — A) +50 = 0, which 
simplifies to \7 — \— 6 = 0. (Alternatively, you could 
have calculated a+ d= 1 and ad — bc = —6, and so ob- 
tained this equation directly.) So the eigenvalues are 
A =3 and A= —2. 
A= 3] The eigenvector equations become 
5a—5y=0 and 10x2—-10y=0, 
which reduce to the single equation y = x. So an eigen- 
vector corresponding to \ = 3 is [1 1)”. 
A =-2)| The eigenvector equations become 
10x—5y=0 and 10% —5y=0, 
which reduce to the single equation y = 2x. So an eigen- 
vector corresponding to \ = —2 is [1 2]”. 


= 0. 


2.3 (a) Example 2.2: 

sum of eigenvalues = 5+ 2 = 7 

and ttrA=3+4+4=7; 

product of eigenvalues = 5 x 2 = 10 
and det A = (3 x 4) — (2x 1) = 10. 
Example 2.3: 

sum of eigenvalues = 4+ (—1) =3 
and trA =2+1=3; 

product of eigenvalues = 4 x (—1) = 
and det A = (2 x 1) — (8 x 2) = —4. 
(b) Exercise 2.1: 

sum of eigenvalues = 7+ 3 = 10 
and ttA=5+5=10; 

product of eigenvalues = 7 x 3 = 21 
and det A = (5 x 5) — (2 x 2) = 21. 


—4 
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Exercise 2.2(a): 
sum of eigenvalues = 2 + (—3) = —-1 
and trA = 1+ (-—2) =-1; 

product of eigenvalues = 2 x (—3) = —6 
and det A = (1 x (—2)) — (4x 1) = -6. 
Exercise 2.2(b): 

sum of eigenvalues = 3 + (—2) = 1 

and tr A = 8+ (—7) =1; 

product of eigenvalues = 3 x C 2)=- 
and det A = (8 x (—7)) — ((—5 


2.4 The characteristic equation is 


5g y/=@-NG-A =o. 


Thus the eigenvalues are \ = p and A= q. 


The eigenvector equations are 
(q—A)y = 0. 
A=p| The eigenvector equations become 
0=0 and (q—p)y=0, 


which reduce to the single equation y = 0 (since p ¥ q), 
so a corresponding eigenvector is [1 OJ”. 


The eigenvector equations become 
(p—q)x=0 and 0=0, 


which reduce to the single equation x = 0 (since p ¥ q), 
so a corresponding eigenvector is [0 1)”. 


These agree with the eigenvectors found in Section 1. 


2.5 The characteristic equation is 
N —L.7A+0.7 =0. 
The eigenvalues are \ = 1 and A = 0.7. 
The eigenvector equations are 
es \)a + 0.2y =0, 
0.la + (0.8 — A)y=0. 


A=1] The eigenvector equations become 
—O.la+0.2y=0 and O0.la—0.2y=0, 

which reduce to the single equation 2y = x, so a corre- 

sponding eigenvector is [2 1]”. 

(In the migration problem, where the total population 

was 18000, an eigenvector corresponding to \ = 1 was 

found to be [12000 6000]7. This is a multiple of 

[2 1)”, as expected, giving stable populations of 12 000 

in Exton and 6000 in Wyeville.) 

A =0.7| The eigenvector equations become 
0.27+0.2y=0 and O0.la+0.1y=0, 


which reduce to the single equation y = —2, so a corre- 
sponding eigenvector is [1 —1]”. 


(Since populations cannot be negative, this solution has 
no relevance for the migration problem.) 


2.6 (a) The characteristic equation is 
a-X 0 

0 a—X 
for which \ = a is a repeated root. 


=(¢e-—A)? =, 


The eigenvector equations are 

{ (a—A)x = 0, 

(a—A)y = 0, 

which for \ = a become 

0=0 and 0=0, 
which are satisfied by all values of x and y, so the eigen- 
vectors are all the non-zero vectors of the form [k J”. 
(Any non-zero vector is an eigenvector, but it is pos- 
sible to choose two eigenvectors that are linearly inde- 
pendent, for example, [1 0]? and [0 1]”.) 
(b) The characteristic equation is 
a-2X 1 
0 a—X 


for which \ = a is a repeated root. 


= (a— >)? =9, 


The eigenvector equations are 
ca 
0=0, 
so we have a single equation y = 0, and a corresponding 
eigenvector is [1 O]7. 


(In this case, there is only one linearly independent 
eigenvector. ) 


2.7 The characteristic equation is A? —4\ +5 = 0. 
The eigenvalues are 


A= $ (4+ V/16 — 20) a 4(4 + 2%) = 24. 
ie. A=2+iandA=2—-i. 
The eigenvector equations are 


oe y=0, 
2a +(1—A)y=0. 


A=2+i| The eigenvector equations become 
(l-i)a-—y=0 and 2a-—(1+i)y=0, 

which reduce to the single equation y = (1 — i)a (since 

(1 + 2)(1 — 7%) = 2), so a corresponding eigenvector is 

[1 1-4]?. 

A =2-i| The eigenvector equations become 
(l+i)a-—y=0 and 2a-(1-—i)y=0, 

which reduce to the single equation y = (1+7)a, soa 

corresponding eigenvector is [1 1+ i”. 


2.8 The eigenvalues are 1 and 2. 
A=1] The eigenvector equations become 
3y=0 and y=0, 


which reduce to the single equation y = 0, so [1 0]7 is 
a corresponding eigenvector. 


A=2] The eigenvector equations become 
—x+3y=0 and 0=0, 


which reduce to the single equation 3y = x, so [3 1]" 
is a corresponding eigenvector. 


Solutions to the exercises 


2.9 For the eigenvalue to be repeated, we require 
JV(a+t d)? — 4(ad — b?) = 0, ie. (a — d)? + 4b? = 0. 
This is true only if a= d and b = 0, so the only sym- 
metric 2 x 2 matrices with a repeated eigenvalue are of 
0 


the form 4 
2.10 (a) The eigenvalues are real, since A is real and 


symmetric. One is positive and the other negative, since 
\yA2 = det A < 0. Also, Ay + Ag = tr A = 50. 


(b) The eigenvalues are the diagonal entries 67 and 


—17, since A is triangular. 


(c) The eigenvalues are real, since A is real and sym- 
metric. In fact, A is non-invertible, since det A = 0. 
Thus one eigenvalue is 0. Hence the other is 306, since 
0+ A. = trA = 306. 


2.11 (a) (i) A? = ki Al hi ‘| = ie ae 


The characteristic equation of A? is 
\* — 29\ + 100 = 0. 


So the eigenvalues of A? are \ = 25 and A= 4. These 
are the squares of the eigenvalues of A. 


4 —2 0.4 —0.2 
és 1 a _ 
ane Ee | ~ Ee “1 
The characteristic equation of AW’ is 


dM —0.71+ 0.1 =0. 


So the eigenvalues of A~! are \= 0.5 and \ = 0.2. 
These are the reciprocals of the eigenvalues of A. 


deg _ {3 2 2 0} |5 2 
(iii) A+m=|i Aa =i ak 
The characteristic equation of A + 2I is 


M114 +28 = 0. 


So the eigenvalues of A + 2I are \ = 7 and A = 4. These 
can be obtained by adding 2 to the eigenvalues of A. 


1 0 


=op | = ~ ie ak 


The characteristic equation of (A — 4I)~+ is 
7 — 0.5 — 0.5 = 0. 
So the eigenvalues of (A — 4I)~! are A= 1 and X= 


—0.5. These can be obtained by subtracting 4 from the 
eigenvalues of A and then finding the reciprocals. 


9 6 
WV) ae b AE 
of 3A is (9 — A)(12 — A) — 18 = 0, which simplifies to 


\? — 211+ 90 =0. Thus the eigenvalues of 3A are 
A = 15 and A = 6, which are three times those of A. 


wo o at] =[% 4] [1] -[2] -2[]. 


el }= Er asl a}-[ a] -4[a] 


so the eigenvectors of A are also eigenvectors of A”. 


Gv) (A-ay = [7 | 


and the characteristic equation 
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so the eigenvectors of A are also eigenvectors of AW'. 


eo caval) =[t a] lt}=[]=7L]) 
aml] =[ lL] 


[3] 


so the eigenvectors of A are also eigenvectors of A + 21. 


(iv) (a—a*|3| = Ee ia | H ~ HE 
(A — 41) | = Ee as | Pt 


-[33]=-«[ 4] 


so the eigenvectors of A are also eigenvectors of 
(A — 4I)7?. 


om sali}=[5 2] Li]= [1s] =|} 
sate Ls ae] [a] =[%e] =21 a], 


so the eigenvectors of A are also eigenvectors of 3A. 


2.12 (a) (i) 4° and (-1), ie. 64 and —1. 
(ii) 47" and (—1)71, ie. $ and -1. 

(iii) 4—6 and (—1) —6, ie. —2 and —7. 

(iv) (44+3)~1 and ((—1) + 3)71, ie. 2 and 3. 


(b) The eigenvalues of A — 41 are 4—4 = 0 and -1—4 
= —5. The matrix A — 4I is non-invertible because one 
of the eigenvalues is 0, so the inverse does not exist. 


2.13 (a) Using Procedure 2.1, we solve the character- 
istic equation det(A — AT) = 0, which can be written 
as 

1-A 2 

3 —-4— 2 
The eigenvalues are A = —5 and A= 2. Solving the 
eigenvector equations for each eigenvalue, we obtain 
corresponding eigenvectors [1 —3]? and [2 1]7, re- 
spectively. 


=)? +3\-10=0. 


The eigenvalues of A’° are (—5)!° and 2!°, correspond- 
ing to eigenvectors [1 —3]" and [2 1]7, respectively. 
(b) b S| = A+2I, where A is the matrix of 
part (a). 
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So the eigenvalues are —5 + 2 = —3 and 2+ 2 = 4, with 
corresponding eigenvectors [1 —3]7 and [2 1]7, re- 
spectively. 


2.14 The characteristic equation is 


d? — (2cos#)A+1=0, 


since sin? 6 + cos? @ = 1. Using the quadratic equation 
formula, the eigenvalues are 


SS 4 (2cos + V/4e0s? 6 — 4) 


= $ (20080 (—4sin? 0) =cosOtisin@. 


So the eigenvalues are cos # + 7sin@ and cos # — isin @. 
The eigenvector equations are 
(cos 6 — A)a — (sin 6)y = 0, 
(sin ™)a + (cos @— A)y=0. 


A =cos@+isin#| The eigenvector equations become 
—(isin 0)x — (sin@)y = 0 and (siné)a — (isin é)y = 0, 
which reduce to the single equation iy = x (since 
sin@ # 0 as @ is not an integer multiple of 7), so a cor- 
responding eigenvector is [i 1]?. 


| A =cos@—isin@) The eigenvector equations become 
(isin @™)x — (siné)y = 0 and (sin@)x + (isin@)y = 0, 
which reduce to the single equation —iy = x (since 
sin 0 # 0), so a corresponding eigenvector is [—i 1]”. 


Section 3 
5 0 O| |0 0 0 
3.1 }1 2 1 Ly) =] )3)=H34115 
1 1 2 1 3 1 
the corresponding eigenvalue is 3. 
5 0 0 0 0 
io 4 Le). 23 
1 1 2 —1 —1 


the corresponding eigenvalue is 1. 


3.2 The eigenvector equations are 


(1 = A) x1 _ 3 = 0, 
rit (2—A)xe x3 =0, 
2x4 ails 2X9 alr (3 — A)x3 = 0. 


A=1] The eigenvector equations become 
221 + 222 + 223 = 0, 


—%3 =0, %1+%2.+%3 =0, 


which reduce to the equations 73 = 0 and x2 = —21, so 
a corresponding eigenvector is [1 —1 0]. 

A= 2] The eigenvector equations become 

2%, + 2%. + 273 = 0, 
which reduce to the equations —73 = x; and —2z%2 = 
£1, SO a corresponding eigenvector is [-2 1 2]?. 

A= 3] The eigenvector equations become 

2x41 + 2% = 0, 


which reduce to the equations 73 = —22, and rg = 
—21, So a corresponding eigenvector is [1 —1 —2]7. 


—£,— 243 = 0, %+23=0, 


—22, — 23 = 0, t1—%2+%43=0, 


3.3 The characteristic equation is 


—r 0 6 
3 —-A Of=0. 
——. 
Expanding the determinant gives 
1 
= 5 —A 
-r| : | +6 2 alee 
2 > 0 3 


which simplifies to X23 —1=0. Since A = 1 satisfies 
this equation, it is an eigenvalue of A. (The other two 
eigenvalues are complex numbers.) 

For \ = 1, the eigenvector equations become 


—2, +623 = 0, $21 — XQ = 0, 502 — 23 = 0, 
which reduce to the equations x; = 6x3 and x2 = 323, 


so an eigenvector is [6 3. 1]?. 


3.4 (a) The characteristic equation is 


—X 0 1 
0 —-A 0} =0. 
1 0 —-A 
Expanding the determinant gives 
—Ar 0 0 —-A 
-\-4 Sf +1]? ~9]=9, 


which simplifies to —A? + A = 0, or A(A? — 1) = 0, so 
the eigenvalues are \= 0, A= —l and A= 1. 


(b) The matrix is triangular, so, from Subsection 2.3, 
the eigenvalues are \= 1, A= 2 and \=3. 


3.5 The characteristic equation is 


8—A 0 —5 
9 3-A —-6 = 0. 
10 0 —T—<A 
Taking the transpose gives 
8-A 9 10 
0 3-A 0 = 0, 
—5 —6 —7-X 
then interchanging rows 1 and 2 gives 
0 3-A 0 
8-A 9 10 =0. 
—5 —6 —7-X 
Expanding by the first row gives 
8—A 10 
—(3—A)| _5 _7_ | =% 


so A = 8 or (8—A)(—7 — A) +50 = 0. This quadratic 
equation simplifies to A? — \—6 = 0, which has roots 
A =3 and A= —2. 


Thus the eigenvalues are \ = 3 (repeated) and A = —2. 


3.6 The characteristic equation of the first matrix is 
a-2X 0 0 
d b-A 0 = 0. 
€ f c—AX 
Expanding by the top row gives 
(a-A)J 2 yf =o, 


so (a—A)(b— A)(e— A) = 0. 


Solutions to the exercises 


Thus the eigenvalues are \ = a, X= band A =c. 


The second matrix is the transpose of the first, so it has 
the same eigenvalues. 


3.7 (a) Example 3.2: 

sum of eigenvalues =5+3+1=9 
and trA=5+2+2=9; 
detA=5x3x1l=15. 


Example 3.3: 

sum of eigenvalues =6+7+3 = 16 
and trA=5+5+6=16; 

dettA =6x7x3= 126. 


(b) Exercise 3.2: 

sum of eigenvalues = 1+2+3=6 
and trA=1+2+3=6; 
detA=1x2x3=6. 

Exercise 3.4(a): 

sum of eigenvalues = 0+ (-1)+1=0 
and trA=0+0+0=0; 

det A = 0 x (-1) x 1=0. 

Exercise 3.4(b): 

sum of eigenvalues = 1+2+3=6 
and trA =1+2+3=6; 
detA=1x2x3=6. 

Exercise 3.5: 

sum of eigenvalues = 3 + 3+ (—2) = 4 
and trA = 8+3+(—7) =4; 

det A = 3 x 3 x (—2) = -18. 


3.8 (a) The characteristic equation is 


—-r 0 1 
0 1-A 0} =0. 
1 0 —Xr 
Interchanging rows 1 and 2 gives 
O 1-A 0 
=r. 0 1} =0. 
1 0 —xr 
Expanding by the first row gives 
—Xr 1 
-a-a|7t _\)=6 


so \=1 or \X7-1=0. 
Since \? —1 = (A—1)(A +1), the eigenvalues are \ = 1 
(repeated) and A = —1. 
The eigenvector equations are 
—Ar1 + 23=0, 


Z1 — \x3 = 0. 


The eigenvector equations become 
0=0, 
which reduce to the single equation 73 = 1. 


—2,;+23=0, x1 — 23 = 0, 


Since x2 can take any value, two linearly independent 
eigenvectors are [0 1 OJ]? and{1 0 1)7. 


h=-l 


t+23 =0, 


The eigenvector equations become 
2x9 = 0, 
which reduce to the equations 73 = —2,; and x2 = 0, so 
an eigenvector is [1 0 —1]’. 


%+23=0, 
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(b) The characteristic equation is 


1 dl. 
ane FB 

0 P40 =) 
seh 0 =}, 

V2 V2 


which, after interchanging the first and second rows, 
simplifies to 

(A—1)Q? —V2A+1)=0. 
The quadratic equation A? — /2\+1=0 has roots 
A= 4(1+7%) and A= jt =). 


v2 v2 
Thus the eigenvalues are \= 1, A= wll +i) and 
A= wll — i). 
The eigenvector equations are 
(4 —) Ly I ars = 9; 
( a A)x2 = 0, 
FBh1 (4 ) 3 = 0 


The eigenvector equations become 
(25-1) e1 + Sys = 0, 0=0, 
Fri (4 1) x3 = 0, 


which reduce to the equations 7; = 0 and 73 = 0, soa 
corresponding eigenvector is [0 1 0j7. 


A= wll +i)| The eigenvector equations become 
— gait + wus = 0, q(v2 1 i)ag = 0, 


rl Fis _ 0, 
which reduce to the equations 73 = 17, and r2 = 0, so 
a corresponding eigenvector is [1 0 iJ”. 


\= wll —i)| The eigenvector equations become 


iri + Favs => 0, a (V2 1 t i)ax2 => 0, 


rl t qaivs = 0, 
which reduce to the equations 73 = —ix, and x2 = 0, 
so a corresponding eigenvector is [1 0 —i]?. 


(The eigenvalue wll —i) and eigenvector [1 0 —i]? 


can be obtained from wll +i)and[l 0 i]7, respec- 
tively, by replacing 7 by —i. That is, the second com- 
plex eigenvalue and corresponding eigenvector are the 
complex conjugates of the first complex eigenvalue and 
corresponding eigenvector. ) 


3.9 (a) The matrix is upper triangular, so the eigen- 
values are 2, —3 and 4. 

The eigenvector equations become 
5%2+2%3=0, 273 = 0, 

which reduce to x2 = x3 = 0, so a corresponding eigen- 
vector is [1 0 |”. 


The eigenvector equations become 
5a, +2%2—-%3=0, 2%3=0, Txr3 = 0, 


which reduce to 5%; + x2 = O and 2x3 = 0, so a corre- 
sponding eigenvector is [1 —5 OJ”. 


rq — x3 = 0, 
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A=4] The eigenvector equations become 
7%2+2%3=0, O=0. 
Choosing x3 = 14 keeps the numbers simple, and a cor- 
responding eigenvector is [—5 4 14]?. 


221 + XQ X3 0, 


(b) The characteristic equation is 


=: 2-0 
ae a =0, 
0 0 1-X 


and (after interchanging the first and third rows) this 
gives (1 — \)(A? + 4) = 0, so the eigenvalues are 1, 2i 
and —2i. 
A=1] The eigenvector equations become 
—2£, + 222 = 0, 2%, -%=0, O0=0, 
so a corresponding eigenvector is [0 0 1)”. 
The eigenvector equations become 
2124 + 2749 = 0, —271 = 21x2 = 0, (1 = 2i)as = 0, 
which reduce to x2 = ix; and x3 = 0, so a correspond- 
ing eigenvector is [1 i 0]? 
A = —2i| Similarly, an eigenvector corresponding to 
A=—2iis [1 -i Oj. 
(c) The characteristic equation is 
1—-A 0O 0 
0 1-A 2 = 0, 
0 —2 5—A 
which gives (1 — A)((1 — A)(5 — A) + 4) = 0. This sim- 
plifies to (1 — A)(A — 3)? = 0, so the eigenvalues are 
A =1 and \ = 3 (repeated). 
A=1] The eigenvector equations become 
0= 0, 223 = 0, —2%2 + 4x3 _ 0, 
which reduce to x2 = 73 = 0, so [1 0 OJ” is a corre- 
sponding eigenvector. 
A= 3] The eigenvector equations become 
—27,=0, -—2%2+223 = 0, 
which reduce to 7; = 0 and 22 = x3, so a correspond- 
ing eigenvector is [0 1 1]7. In this case, the re- 
peated eigenvalue \ = 3 has only one linearly indepen- 
dent eigenvector. 


(d) The matrix is lower triangular, so the eigenvalues 
are —2 and 1 (repeated). 
A =-2)| The eigenvector equations become 

321 = 0, 3X2 = 0, 
which reduce to x1 = 72 =0, so [0 0 1)” is a corre- 
sponding eigenvector. 
A=1] The eigenvector equations become 

0=0, 
These equations are satisfied if 21 = 323 — x2 (what- 
ever values we choose for x2 and x3). Two linearly 
independent eigenvectors can be found. For example, 
setting r2 = 1 and x3 =0 gives [-1 1  0]7, and set- 
ting x2 = 0 and x3 = 1 gives [3 0 1)7. 


% +22 =0, 


a +22 —3273 = 0. 


Section 4 
trm-ae-[? L]-E 
n-an-[? IE]-[]: 
o-am-f 30) -(3] 
3 9) [47 219 
ef Ae ki | El - Ee 


42.18) i= Roy i | | = ke 


vote [FE 


a ae=[f 3 [S]=[H]. 


(b) e1, = Aeio = [88574 88573]7. 


(c) The characteristic equation is (2 — A)? — 1 = 0, so 
2—= +1, thus \; = 3 and Ag = 1. Corresponding 
eigenvectors are v; =[1 1]? andvg=[1 —1]’. 


II 
-— 
ee Ot 
4 


(d) We need to determine constants a and ( so that 
€9 =avit Bvo, i.e. 


ft 1} [3] =[o} 


Solving this equation for a and ( gives a = 5 and B= 
Thus we have 
€o = $V1 + $V2. 
(e) e; = Aeg = A(3v1 + V2) 
= sAv\ + sAv2 
3 


_ 1 
= 9V1 + 5V2, 


eg = Ae, => A(3v1 + v2) 
= 3Avi a sAv2 


— 9 1 
= 5V1 + 5 V2- 


Similarly, 
= il n 
en = 5(3"V1 + va). 


(f) The coefficient of vi, i.e. $ x 3", dominates the 


expression for e,, so we obtain approximations to v1 
(an eigenvector corresponding to the eigenvalue of larger 
magnitude). 


4.3 (a) Ife 9 = avi + Bvo, then 
10000 2 1 
[sine] =* [7] +8[ a] 
so a and @ satisfy the simultaneous equations 
ee 10000, 
a — 8 = 8000. 
Solving these equations gives a = 6000, @ = —2000, so 
e€9 = 6000v1 — 2000v2. 


Solutions to the exercises 


(b) Since Av; = vj and Av2 = 0.7v2, we have 
e; = Aeg = 6000A Vv, — 2000Av2 
= 6000v1 — 2000(0.7)va, 
ep = Ae; = 6000AVv1 — 2000(0.7) Ave 
= 6000v1 — 2000(0.7)?vz. 
Continuing in this way, we obtain 
e,, = 6000v1 — 2000(0.7)”"v2. 
(c) As n becomes large, (0.7)" becomes small and the 


term in v2 can be ignored in comparison with the term 
in v;. Thus 


e,, ~ 6000v, = [12000 6000)", 


which agrees with our observations in the Introduction. 


4.4 First iteration: 


II 
st 
ew 
Cm 
i 
rt 
Aloo 
i 

lI 
——— 
= [als 
— i 


Z2 
a2 = 11, 
33 3 
a ']11 1 


Since eg = e1, the third iteration will be identical to the 


second, and e3 is also [3 1]. 


So [3 1)” is an eigenvector. We have 


el [3]-[2]-" [3] 


so the corresponding eigenvalue is 11. Since tr A = 11, 
the other eigenvalue is 0, which explains why the itera- 
tion converges so rapidly. 


ew 


(This is a very special case; generally, we would not ex- 
pect e,, to be equal to an eigenvector for any value of n, 
unless we start with an eigenvector.) 


4.5 We have A! = + Le eI 


First iteration: 


_aif 5 -3]f1)_]]a4 
AS ae ae ela (3 
1 


therefore a, = _ and e; = [1 ot a 
Second iteration: 
13 
1 5 —3 1 22 
tee: P| | [sea 4 
2 22 
therefore a2 = —33 ande, =[-33 1]*. 
Third iteration: 
134 
i [ 5 -3][-a3 ~ 253 
Z3 = i _8 7 1 = 265 ‘ 
253 
therefore a3 = 3en and e3 = [— 34 ie 
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Taking es = [—333 1)? ~ [-0.506 1] as our ap- 
proximation to the eigenvector, the estimate for the 
corresponding eigenvalue of A of smallest magnitude 
is 1/ag = ae ~ 0.955. (The exact eigenvector is 
[—0.5 1)", corresponding to the eigenvalue 1.) 


4.6 We follow Procedure 4.3(a). 
Twenty-first iteration: 
zo1 = (A +1)~'e20 


a 1 1.725 
=;-i 3 -1 0.141] = | 9.243 
ae 0.379 ; 
1 1 1 —_ = 
Ja, 3 0.654 
Q 2, = 1.725, 
en = 2 ~ [1 0.141 —0.379]7. 
21 


The sequence a, appears to have converged to 1.725, 
and a corresponding eigenvector is approximately 
[1 -0.141 —0.379]". 


Since a, converges to 1/(A, — p) = 1/(A1 +1) = 1.725, 
we have Ay = pag — 1 ~ —0.420. 
4.7 (a) We follow Procedure 4.1. 
First iteration: 
ee be | | _ | 0.5 
14 -0.3| [0 [1.4] ] 
a, = 1.4, 
— zy iT. eal _ ian 
a, 14 [14 1 
Second iteration: e2 = [1 0.256881)". 
Third iteration: e3 = [0.494452 1)". 
(b) We follow Procedure 2.1. 
The characteristic equation is 
»? — 0.24 — 0.99 = 0, 
so the eigenvalues are A, = —0.9 and Az = 1.1. 
A, = —0.9] The eigenvector equations both become 
14¢x+0.6y=0, ie. 3y=—T7z, 
so a corresponding eigenvector is [3 —7]”. 
A2 = 1.1} The eigenvector equations become 
—0.62+0.6y=0, 1.427 —14y=0, 


which reduce to « = y, so a corresponding eigenvector 
is [1 1)?. 


(c) The sequence e,, will converge to an eigenvector 
corresponding to the eigenvalue of larger magnitude, 
iete ft 1". 


(dj) A’v=)*v. 


(e) We express ep in terms of the eigenvectors as 
e€9 = avi + Bva, so 


Sere 


giving a =0.1 and 6 = 0.7. 
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In general, e, = A”ep, so we calculate 
A®eg = A8(0.1v1 + 0.7v2) 

= 0.1(A’v1) + 0.7(A8v2) 

= (0.1)(—0.9)8v1 + (0.7)(1.1)8 v2 

~ 0.043 047v, + 1.500512v2 

= [0.129140 —0.301 327]? 

+ [1.500512 1.500512)” 

= [1.629652 1.199185)". 

Dividing by 1.629652, we obtain eg = [1 0.735853)". 
2 0.303030 0.606 060 

a peer —0.505 real 
We follow Procedure 4.2(a). 
First iteration: 


0.484 848 
a = A” eo = | 1.070707 | 
a = —1.070 707, 
eo = 2 if | ae 
Qy 1.070 707 | —1.070 707 
= i | 
' ; 


Second iteration: 
ag = —1.145 416, 
Third iteration: 
a3 = —1.083884, e3 = [—0.444720 1]7. 
This sequence of vectors is converging to [-3 1]? 


~ [-0.428571 1], an eigenvector corresponding to 
the eigenvalue of smallest magnitude, 4 = —0.9. 


e2 = [—0.409318 1”. 


(g) Convergence will be slow for direct iteration, since 
the two eigenvalues of A are —0.9 and 1.1, which are 
relatively close in magnitude. Similarly, inverse itera- 
tion will also be slow to converge, since the eigenvalues 
of A~! are —1.1111 and 0.9091, to four decimal places, 
and these are relatively close in magnitude. 


4.8 We could use Procedure 4.3 to find the third eigen- 
vector, but without a computer this could be very hard 
work. Alternatively, we could find two of the eigenval- 
ues from the given eigenvectors, and we could then find 
the third eigenvalue very easily because the sum of the 
eigenvalues is tr A. 


We know that v; is an eigenvector, so Av; = Aj,Vv1. 
Calculating just the third components, we have 


(3 x 0.477) + (4 x 0.689) + (6 x 1) =A1 x 1, 
so Ay & 10.187. 


Similarly, using the second components of Av2 = A2Vva, 
we have 


(2 x (—0.102)) + (3 x 1) + (4 x (—0.641)) = A2 x 1, 
so Ag ~ 0.232. 
Since tr A =14+3+6=10, 

A3 = 10— Ai — Ag & —0.419. 


UNIT 11 Systems of differential 
equations 


Study guide for Unit 11 


Before reading this unit, you will need to be familiar with the main ideas of 
Units 2 and 3 on differential equations. You will also need to be familiar 
with the properties of matrices and determinants, as described in Unit 9, and 
with the material on eigenvalues and eigenvectors in Unit 10. The material 
in this unit will be needed for later units in the course — in particular, 
Unit 18. 


Section 1 is intended to show you how systems of differential equations arise 
from modelling. It should not take you very long to study, since there are 
no exercises. 


Section 2 is the most important part of the unit, and depends heavily on 
Unit 10. In particular, you should make sure that you understand Subsec- 
tion 2.1 before proceeding. 


Subsection 3.1 of Section 3 is more theoretical, and you should make sure 
that you understand the statements of the two theorems. In Subsection 3.2, 
you should concentrate on the first part of the subsection rather than the 
exceptional cases that come later. 


Section 4 is a straightforward section in which the ideas from previous sec- 
tions are applied to a different type of problem. 


Section 5 involves use of the computer algebra package for the course. 
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Introduction 


You have already met several types of linear differential equation in this 
course. For example, in Unit 2 you met linear first-order differential equa- 
tions of the form 


H 4 gla)y = A(x), 


where g(x) and h(x) are given functions, and in Unit 3 you met linear 
constant-coefficient second-order differential equations of the form 
dy dy 
avg thy + cy = f(z), 


where a, b and c are constants, and f(z) is a given function. 


We now turn our attention to systems of linear differential equations relating 
two or more functions and their derivatives. A simple example of such a 
system is 


v= 3x + 2y, 
y= «+ 4y, 


where «(t) and y(t) are both functions of the independent variable t. You 
can think of (4, y) = (a(t), y(t)) as the velocity at time t of a particle at 
position (x, y) = (x(t), y(t)). If we are given the above system and an initial 
condition for the position of the particle at time t = 0, then solving the 
differential equations will give us the position (x(t), y(t)) of the particle at 
any subsequent time t. 


In Subsection 2.1 you will see that we can solve these differential equations 
by writing the second as « = y — 4y and substituting this expression into the 
first to give a single second-order differential equation in y. However, this 
method does not extend easily to more complicated systems, such as those 
involving three differential equations in three unknowns. In this unit we 
develop techniques for solving such systems of linear differential equations 
with constant coefficients. We shall see that such systems can be written in 
matrix form, and that we can solve them by calculating the eigenvalues and 
eigenvectors of the resulting coefficient matrix. 


We also examine systems of equations involving second derivatives, such as 


ZE=ax+A4y, 
yY=xr-2y, 


where you can think of (%, i) = (%(t), y(t)) as the acceleration at time t of 
a particle at position (x, y) = (a(t), y(t)). 


In Section 1 we show how various situations can be modelled by a system 
of linear differential equations. In Section 2 we show how such a system 
can be written in matrix form, and use eigenvalues and eigenvectors to solve 
it when the equations are homogeneous with constant coefficients. This 
discussion spills over into Section 3, where we discuss the inhomogeneous 
case, and into Section 4, where we show how similar techniques can be used 
to solve certain systems of second-order differential equations. In Section 5 
we use the computer to apply many of the techniques of this unit to various 
situations. 


Although many of the problems that we consider in this unit may appear 
to be rather restricted in scope, the type of system discussed here arises 
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Recall that « means da /dt 
and y means dy/dt. 


Conversely, we can always 
write a single second-order 
differential equation as a pair 
of first-order equations: for 
example, writing y = z, we 
can replace % — 34 — 2x = 0 
by the system 


i Y, 
y = 2x + 3y. 


Section 1 Systems of differential equations as models 


surprisingly often in practice. In particular, systems of linear constant- 
coefficient differential equations occur frequently in modelling situations, 
especially when we need to make simplifying assumptions about the situ- 
ations involved. In Unit 13, we shall consider certain types of non-linear 
systems. 


1 Systems of differential equations as 
models 


In this short motivational section, you will see how systems of linear differ- 
ential equations arise in the process of modelling three rather different types 
of situation. 


We shall not give here full details of the modelling process which is required 
in each case, since the aim is to give a fairly rapid impression of where 
systems of differential equations might occur in practice. You should not 
spend too much time dwelling on the details. 


1.1 Conflicts 


Consider a situation in which two different groups are in direct competition The model described in this 
for survival. In a military context, the individual members of these groups subsection was first published 
might be humans (soldiers, say) or they might be tanks, ships or aircraft. In by F. W. Lanchester in 1914. 
the absence of any external means of stopping the conflict, a battle unfolds 

by a process of attrition, in which individual members of the two groups are 

in one way or another rendered inactive (in the case of humans, killed or 

severely wounded). The battle terminates when one side or the other has 

lost all of its active members. 


What factors affect who will ‘win’ such a conflict? Other things being equal, 
we would expect a larger group to prevail over a smaller one, so the size of 
each group is important. However, it is often the case that one side is 
more effective per member than the other. Militarily, this effectiveness is 
determined by the choice and design of the weaponry used, and a recognized 
measure of this effectiveness is the abhorrent term kill rate, that is, the rate 
at which single members of one group can, on average, render members of 
the other group inactive. 


For two groups of equal initial size, the more effective group will, on average, 
win a battle. But what will occur when group X is numerically larger than 
group Y but has inferior weaponry? We shall describe a simple model which 
is capable of providing a first answer to this question. 


The model is a continuous one: it approximates the actual situation, in 
which the active group size at any time is an integer, by assuming that 
the group size is capable of continuous variation. This is a very reasonable 
approximation if each group has a large number of members. 
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Let the active sizes of groups X and Y be denoted by «x and y, respectively. 
These sizes vary with time t, so x(t) represents the active size of group X 
at time ¢, and similarly for y(t). Suppose that the constant kill rates of the 
two groups are a for group X, and ( for group Y, where a and £ are both 
positive. 


We suppose that the rate of reduction of each group is proportional to the 
size of the other, so 


dx d 
Pe —Py and 77 == 27, 
This pair of equations can be written alternatively as 
y = ar, , 


which is a system of two first-order differential equations. Neither of these 
equations is soluble directly by the methods of Unit 2, because they are 
coupled; that is, the equation which features the derivative of one of the 
variables x also includes the other variable y on the right-hand side, and 
vice versa. In order to solve the first equation for x, we need to know 
explicitly what the function y(t) is, and similarly for the second equation. 


You will appreciate in Section 2 that this system of equations may be solved 
in terms of the eigenvalues and eigenvectors of the matrix 


[2 “a: 


which arises from expressing Equations (1.1) in the matrix form 


les eases ae 


There are also other methods of solution. However, without solving this 
system of differential equations, it is possible to deduce an interesting con- 
clusion about such conflicts. Suppose that we multiply both sides of the 
first equation of (1.1) by ax, then both sides of the second equation by Gy, 
and finally subtract the resulting equations. This produces 


axxz — Byy = 0, 


which, by the Chain Rule, can also be expressed as 


d d 
1 2 1 2). = 
bat (0?) — 38-2 (y?) =0. 


Integration with respect to time then gives 


ax” — By =, 


where c is a constant. If the initial sizes of the two groups (at time t = 0) 
are respectively zo and yo, then the value of c is ax? - Bye, and we have 


ax? — By? = axe — Bye (1.2) 
throughout the conflict. 


This relationship allows us to predict when two sides are equally matched 
in a conflict. Neither side wins if both have their size reduced to zero at 
the same time, that is, if ¢ =0 when y=0. In this case, we must have 
ax? — By? = 0 throughout the conflict, so at the start ax? = Oye. This 
reasoning led Lanchester to define the fighting strength of a force as ‘the 
square of its size multiplied by the kill rate of its individual units’. 
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According to this definition, a force which outnumbers an adversarial force 
by only half as much again is more than twice as strong, assuming equal 
effectiveness on both sides. For any potential conflict, the initial fighting 
strengths of the two sides can be estimated, and according to Equation (1.2) 
any difference in these strengths remains constant throughout the conflict. 
At the end, the size and therefore the strength of the loser is zero, so this 
difference is equal to the remaining strength of the winning side. From this, 
the number of survivors can be predicted. 


The model is of course very simple, and will not be applicable in exact form 
for any particular conflict. Nevertheless, the rule of thumb which it provides 
is an informative one, and is taken into account by, for example, military 
strategists. 


1.2. Two-compartment drug model 


The administration of clinical drugs is a complex task. The doctor must 
ensure that the concentration of the drug in the patient’s blood remains 
between certain upper and lower limits. One model that has been developed 
to assist in understanding the process is given by the following differential 
equation, in which c(t) represents the concentration of drug in the patient’s 
blood (as a function of time): 


dc q 
a yo 


Here the drug enters the bloodstream at a constant rate q, V is the ‘apparent 
volume of distribution’, and \ is a constant (which may be interpreted as 
the proportionate rate at which the kidneys excrete the drug). You are not 
expected to understand how this equation is derived. 


Such a constant rate q of drug input to the bloodstream may be achieved 
by intravenous infusion, but this requires the patient to be connected to 
certain apparatus and is consequently inconvenient. A modern alternative is 
to use slow-release capsules which are taken orally. These capsules gradually 
dissolve within the stomach and by so doing raise the drug concentration 
there. The drug reaches the bloodstream from the stomach by a process 
which may be represented as passing through a membrane which separates 
the stomach ‘compartment’ from the bloodstream ‘compartment’. Later 
on the drug is excreted from the kidneys, as before. The whole process is 
indicated diagrammatically in Figure 1.1. 


We shall now indicate briefly how to model such a two-compartment situa- 

tion. We make the following assumptions. 

(a) The slow-release capsule has the effect of providing a constant rate of 
input of the drug to the stomach, until the capsule is completely dis- 
solved. 

(b) The concentrations of drug within the stomach compartment and within 
the bloodstream compartment are each uniform at any instant of time. 

(c) The rate at which the drug passes from the stomach to the bloodstream 
is proportional to the difference in concentrations between them. 

(d) The rate of excretion from the bloodstream via the kidneys is (as in the 
earlier model) proportional to the concentration in the bloodstream. 


Suppose that the drug concentrations in the stomach and bloodstream at 
time t are denoted by x(t) and y(t), respectively, and consider the period af- 
ter the slow-release capsule has been swallowed but before it has completely 
dissolved. 


Lanchester applied this model 
to the situation at the Battle 
of Trafalgar (1805), and was 
able to demonstrate why 
Nelson’s tactic of splitting the 
opposition fleet into two parts 
might have been expected to 
succeed, even with a smaller 
total number of ships. 


drug input 


stomach 


bloodstream 


v 
drug excreted 


Figure 1.1 


Assumption (c) is based upon 
Fick’s law, which is an 
empirical result stating that 
‘the amount of material 
passing through a membrane 
is proportional to the 
difference in concentration 
between the two sides of the 
membrane’. 
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Unit 11 Systems of differential equations 


The drug concentration in the stomach will be raised at a constant rate, 
k, say, by Assumption (a), but simultaneously lowered at a rate ko(x — y), 
where kz is a constant, by Assumption (c). Hence we have 


dx 
dt 


The drug concentration in the blood will be raised at a rate k3(a — y), by 
Assumption (c), but also lowered at a rate kay, by Assumption (d) (where 
kz and k4 are constants). This gives 


= ky — ko(x — y). 


The drug concentrations in the two compartments are therefore governed 
within the model by the pair of differential equations 
ee koy + ky, 


: 1.3 
y= kgx —(kgt ka)y, ibe) 


which can be written in matrix form as 


Eel eee | sli 


Figure 1.2 sketches the type of behaviour which this model predicts for 
the drug concentration y(t) in the bloodstream. It is characterized by an 
absorption phase followed by a steady decline as the drug is eliminated. The 
graph for the drug concentration x(t) in the stomach is of similar shape. 
The drug concentrations do not approach zero for large values of t so long 
as k, is non-zero. In reality, a slow-release capsule will eventually dissolve 
completely, and then the patient must swallow another capsule in order to 
prevent the drug concentration from falling below some predetermined level. 


The eigenvalues and eigenvectors of the matrix of coefficients 


—ko ko 
kg —(k3 + ka) 


provide a starting point for solving the system of equations (1.3). However, 
the form of this system differs from that which was derived in the previous 
subsection. The presence of the term k, on the right-hand side of the first 
equation, which is non-zero and does not depend upon «x or y, makes this 
system inhomogeneous, whereas system (1.1) in Subsection 1.1 is homoge- 
neous, since it has no such term present. You will see shortly that, as with 
ordinary differential equations, the solution of an inhomogeneous system 
is related to, but slightly more complicated than, that of a homogeneous 
system. 


1.3 Motion of a ball-bearing in a bowl 


The previous examples each led to a system of two first-order linear differen- 
tial equations, homogeneous in the case of the conflict model, and inhomo- 
geneous for the two-compartment drug model. We now look at an example 
where the system which arises involves second-order linear differential equa- 
tions. 


When a ball-bearing (that is, a small metallic ball) is placed in a bowl and set 
in motion, it tends to roll along the surface of the bowl for some considerable 
time before coming to rest. We shall now indicate how this motion might 
be modelled for a generalized ‘bowl’. 
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We might expect that ko = kg 
if all the drug leaving the 
stomach enters the 
bloodstream, but this is not 
one of our assumptions. 


YY 


Figure 1.2 


You will study methods of 
solution in the following two 
sections. 


However, system (1.3) will 
become homogeneous if the 
slow-release capsule dissolves 
completely and is not 
replaced. This corresponds to 
putting k; = 0. 


Section 1 Systems of differential equations as models 


To simplify matters, we model the ball-bearing as a particle (so that the 
rolling aspect of the motion is ignored), and assume that the surface of 
the bowl is frictionless and that there is no air resistance. To describe the 
surface, suppose that a three-dimensional Cartesian coordinate system is 
chosen with the (x, y)-plane horizontal, the z-axis vertical, and the lowest 
point of the bowl’s surface at the origin, (0,0,0). We assume further that 
in the vicinity of the origin, the surface may be described by an equation of 
the form 


z= ganz” + bay + sey’, (1.4) 


where the constants a, b, c satisfy the conditions ac — b? > 0 and a> 0. 
The first condition here ensures that the surface under consideration is 
a paraboloid, for which any vertical cross-section through the origin is a 
parabola, while the second condition means that these cross-sectional parabo- 
las are all concave upwards rather than downwards. The surface is sketched 
in Figure 1.3. 


It may seem restrictive to specify the particular form of surface given by 
Equation (1.4). However, it turns out that many other functions z = f(x, y) 
with a minimum value z = 0 at (0,0) can be approximated satisfactorily by 
a function of the form (1.4) near to the origin. 


Since the surface is assumed to be frictionless and there is no air resistance, 
the only forces which act upon the particle are its weight W = —mgk (where 
mis its mass and g is the acceleration due to gravity) and the normal reaction 
N from the surface. In order to describe the latter, we need to be able to 
write down a vector which is normal to the surface given by Equation (1.4) 
at any point (2,y,z) on the surface. One such vector, pointing inwards, is 

—(ax + by)i-— (br + cy)j +k. 
Hence the normal reaction is 

N = C(—(azx + by)i — (bz + cy)j +k), 
where C is some positive quantity. Newton’s second law then gives 

mr =N+W 

= —C(ax + by)i — C(bx + cy)j + (C — mg)k, 

where r = xi+ yj + zk is a position vector of a point on the surface, relative 
to the origin. Resolving in the i-, j- and k-directions, we obtain 

mz = —C(az + by), 

my = —C(bx + cy), 

mz =C—mg. 
On eliminating the quantity C' between these equations, and dividing through 
by m, we have 

%=—(ax+by)\(g+z2) and g=—(br+cy)(g+2). 


For motions which do not move too far from the lowest point of the bowl, the 
vertical component of acceleration, Z, will be small in magnitude compared 
with g, so to a good approximation the horizontal motion of the particle is 
governed by the pair of equations 


{ % = —g(ax + by), 
¥ = —g(bx + cy). 


This motion is what would be observed if you looked down onto the surface 
from some distance above it, as indicated in Figure 1.4. 


Figure 1.3 


You will be able to appreciate 
this point more fully after 
studying the topic of 
two-variable Taylor 
polynomial approximations in 
Unit 12. 


We ask you to take this on 
trust. 


ev 


CY 


Figure 1.4 
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For example, if the surface is given by 


z = 0.2527 — 0.4ry + 0.25y”, 


then the corresponding equations for horizontal motion (taking 10ms~? as 


an approximation to g) are 


We oy 


y= Ax —5dy. ae) 


These differential equations are linear, constant-coefficient, homogeneous 
and second-order. You will see them solved, and the possible motions inter- 
preted, in Subsection 4.2. As you will see, their solutions can be expressed 
in terms of the eigenvalues and eigenvectors of the matrix 


Pal 


which arises from expressing the pair of equations (1.5) in matrix form as 


(é]=[-3 4] [z]. 


2 First-order homogeneous systems 


In this unit we discuss systems of equations involving either two functions 
of time z(t) and y(t), or three functions of time x(t), y(t), z(t) (which we 
abbreviate to x, y and z). For larger systems it is often convenient to 
denote the functions by subscripts, for example 71, 72,...,2n, but we shall 
not use this notation here because we wish to use subscripts later for another 
purpose. Throughout, we write 


as appropriate. 


In Unit 9 you saw that any system of linear equations can be written in 
matrix form. For example, the equations 


3x + 2y =5, 
z+ 4y =5, 
. : ‘ 3.2) \/a 5 . 
can be written in matrix form as f 1 * a 5]: that is, as Ax = b, 
3.2 x 9) 
where A = |} 1 ==, and b= | 3]. 


In a similar way, we can write systems of linear differential equations in 
matrix form. To see what is involved, consider the system 


mene 


y= c+4y+5, en) 


which can be written in matrix form as 
£ 3.2) \/a 5t 
l-[ ali [e]. 

that is, as x = Ax+h, where A = Ei Nee H and k= FI. 
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Here x and y are functions 
of t. 


Here h is generally a function 
of t, but A is constant. 
Note that x= [z J’. 


Section 2 First-order homogeneous systems 


We can similarly represent systems of three, or more, linear differential equa- 
tions in matrix form. For example, the system 


b= 382+2y+2z24+ ef, 


y = 2x + 2y + 2e’, 
z= 22 + 4z, 
can be written in matrix form as x = Ax +h, where Note that here 
; x=(t y 4. 
3.2 2 x € 1 
A=|2 2 0], x=l/y| and h= | 2e’| = |2| e’. (2.2) 
2 0 4 zg 0 0 
Definition 
A matrix equation of the form x = Ax + h is said to be homogeneous Note that in an 
if h = 0, and inhomogeneous otherwise. inhomogeneous system, some, 
but not all, of the components 


of h may be 0. 
For example, the system 


cr) |2 3] }a 
y} [2 lily 
is homogeneous, whereas systems (2.1) and (2.2) are inhomogeneous. 


*Exercise 2.1 


Write each of the following systems in matrix form, and classify it as homo- 
geneous or inhomogeneous. 


b=2c0+y41 t= eae 
(a) ‘ie 7 ak (b) LS (c) (y= x+2y4+ 2 
Z= £t yt2z 


In this section we present an algebraic method for solving homogeneous 
systems of linear first-order differential equations with constant coefficients. 
In Section 3 we show how this method can be adapted to inhomogeneous 
systems. 


2.1 The eigenvalue method 


Our intention is to develop a method for finding the general solution of any 
homogeneous system x = Ax, however large, but we shall start with a 2 x 2 
matrix A. We begin by solving a system of differential equations using a 
non-matrix method, then we look at the same example again, but this time 
the emphasis is on the links to matrices. 


Suppose that we wish to find a solution of the system 


z= 3x + 2y, (2:3) 
y= xot+4y. (2.4) 


Equation (2.4) gives « = y — 4y, so & =  — 4y. Substituting into (2.3), we 
obtain 


9 — 4y = 3(y — 4y) + 2y, 
which simplifies to give 


j —7y + 10y = 0. 
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Using the methods of Unit 3, we see that the auxiliary equation is 
M7 +10 =0, 
with roots \ = 5 and \ = 2, from which we have y = ae’ + Ge?" (for arbi- 


trary constants a and (3). Having found y, we can substitute for y and y in 
Equation (2.4) and obtain 2 = ae™ — 26e”", so the solution is 


xz =ae™ —26e" and y=ae™ + Be. 


Thus we have found the general solution of our original system of two equa- 
tions, and this solution contains two arbitrary constants, as expected (one 
for each derivative). 


Definitions 


(a) The general solution of a system of n linear constant-coefficient 
first-order differential equations is a collection of all possible solu- 
tions of the system of equations. 


(b) A particular solution of a system of n linear constant-coefficient 
first-order differential equations is a solution containing no arbi- 
trary constants and satisfying given conditions. 


Usually, the general solution of a system of n first-order differential equations 
contains n arbitrary constants. 


The above method does not extend very well to larger systems. However, 
notice that the above general solution is a linear combination of exponen- 
tial terms of the form e*!’ and e%2! (where in the above case Ay = 5 and 
A2 = 2). This suggests an alternative approach to solving such systems of 
equations. In the above case we have a solution x = e™, y = e®! correspond- 
ing to choosing a = 1 and 3 = 0, and another solution « = —2e?’, y = e”! 
corresponding to choosing a = 0 and G=1. Thus the general solution is 
a linear combination of much simpler solutions, and this suggests that we 
might be able to solve such systems by looking for these simpler solutions. 
So the idea is to search for solutions of the form x = Ce, y = De’, and 
then form linear combinations of such solutions in order to find the general 
solution. There is only one minor problem — we have to convince ourselves 
that a linear combination of two solutions is itself a solution. 


Suppose that x; and xg are solutions of the matrix equation x = Ax; then 


x, = Ax, and xo = Axo. If a and @ are arbitrary constants, then 


d . 7 
a vox + 8x2) = ax, + Bx. = aAx; + BAX, = A(ax; + $x), 


so ax 1+ (xo is indeed a solution. This is a particular case of a more general 
result known as the principle of superposition that you will meet later (in 
Theorem 3.1). 


We apply the above technique in the following example. 


Example 2.1 


(a) For the following pair of equations, find a solution of the form x = Ce, 
y = De™, and hence find the general solution: 


fo. 


; 2.5 
y= w2+Ay. (2.5) 


(b) Find the particular solution for which x(0) = 1 and y(0) = 4. 
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Compare the definitions for 
the single equation cases in 
Unit 2, Section 1. 


You met the principle of 
superposition in the case of 
second-order differential 
equations in Unit 3. 


Section 2 


Solution 


(a) We investigate possible solutions of the form 


wa 


2=6e",. ga De". 
where C and D are constants. Since ¢ = Ce and y = De“, substi- 
tuting the expressions for x and y into Equations (2.5) gives 
Cre™ = 3Ce** + 2De**, 
Die = Ce +4De™. 


Cancelling the e* terms, we obtain the simultaneous linear equations 
{ Cr = 3C + 2D, 


DAX= C+4D, 
which can be rearranged to give 
(3 -—A)C + 2D =0, 
{ Ci) =0. 26) 


This system of linear equations has non-zero solutions for C and D only 
if the determinant of the coefficient matrix is 0, that is, if 

3-A 2 

1 4— 
Expanding this determinant gives (3 — \)(4— A) — 2 = 0, which simpli- 
fies to \? — 7\ + 10 = 0, and we deduce that \ = 5 or \ = 2. We now 
substitute these values of \ in turn into Equations (2.6). 


[=0. 


A =5]| Equations (2.6) become 
{ —2C +2D =0, 


C-. D=0; 


which reduce to the single equation C = D. Choosing C = D = 1 pro- 


vides us with the solution x = e™’, y = e™. 


Equations (2.6) become 


C+2D=0, 

C+2D=0, 
which reduce to the single equation C = —2D. Choosing C = —2 and 
D =1 provides us with another solution, 2 = —2e?’, y = e?*. 


In vector form, these solutions are 


Xi, = Al and x)= | a”. 


(Notice that these solution vectors are linearly independent.) 


Thus the general solution can be written in the form 


* =ali]e+a[ i] (2.7) 


The fact that x; and x2 are linearly independent ensures that this ex- 
pression contains two arbitrary constants (in other words, one term 
cannot be absorbed into the other). 


We seek the solution for which x = 1 and y = 4 whent = 0. Substituting 
these values into Equation (2.7) gives the simultaneous linear equations 
l=a- 28, 
4=a+ £. 
Solving these equations gives a = 3, G = 1, so the required particular 


solution is a et 4 =e ce 
y 3 1 


First-order homogeneous systems 


e® can never be zero. 


See the summary of results 
for non-invertible matrices in 


Unit 10. 


You will see later that 
choosing any non-zero value 
will work. 


Linear independence of 
vectors is defined in Unit 10. 


Here we are using the 
principle of superposition. 


We shall not prove that this 
is the case. 
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You may have noticed that the method in Example 2.1(a) is similar to 
that used for calculating eigenvalues and eigenvectors in Unit 10. This 
similarity is not coincidental. Indeed, if we write the system of differential 
equations (2.5) in matrix form x = Ax, that is, 


éj-[3 3] [z]. 


then the numbers \ = 5 and A = 2 arising in the general solution (2.7) turn 
out to be the eigenvalues of the coefficient matrix 


3. 2 
a=[t i] 
and the vectors [1 1)’ and [—2 1]? appearing in the general solution are 
corresponding eigenvectors. 


To see why this happens, consider a general system of homogeneous linear 
constant-coefficient first-order differential equations x = Ax, of any size. 
Suppose that there is a solution of the form x = ve’, where v is a constant 
column vector. Then x = vAe*, and the system of differential equations be- 
comes v\e** = Ave. Dividing the latter equation by e* (which is never 0) 
and rearranging, we have 


Av = Dv. 
Thus v is an eigenvector of A, and 4 is the corresponding eigenvalue. 


Conversely, we have the following result. 


Theorem 2.1 


If \ is an eigenvalue of the matrix A corresponding to an eigenvec- 
tor v, then x = ve 
x= Ax. 


»t is a solution of the system of differential equations 


Example 2.2 


A particle moves in the (x, y)-plane in such a way that its position (x, y) at 
any time t satisfies the simultaneous differential equations 


r=ax+A4y, 
y= u—2y. 
Find the position (x,y) at time t if 7(0) = 2 and y(0) =3. 


Solution 

1 4 
1 -—2 
[4 1]? with corresponding eigenvalue 2, and [1 —1]" with corresponding 
eigenvalue —3. The general solution is therefore 


sl=eli}e7l-iJes 


where a and @ are arbitrary constants. 


The matrix of coefficients is A = F The eigenvectors of A are 


Since x(0) = 2 and y(0) = 3, we have, on putting t = 0, 


2=4a+ £, 
3= a-Z. 
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You may like to check that 


sll 
Alf] 


This theorem holds because if 


x = ve, then 
x = Ave*t = Ave = Ax. 


These eigenvectors and 
eigenvalues were found in 
Unit 10, Exercise 2.2(a). 


Section 2 First-order homogeneous systems 


Solving these equations gives a = 1, 6 = —2, so the required particular so- 
lution is 


il-[les tae 


In the above example the particle starts at the point (2,3) when t = 0, 
and follows a certain path as t increases. The ultimate direction of this 
path is easy to determine, because e~*’ is small when t is large, so we have 
[x yl? ~ [4 1]%e*, that is, x ~ 4e7", y ~ e”*, so w ~ 4y and y ~ 0.252. 
Thus the solution approaches the line y = 0.25a as t increases. 


*Exercise 2.2 


Use the above method to solve the system of differential equations 5 9 
$= 5a oy, The eigenvalues of 9 | 
y = 2x + By, are 7 and 3, corresponding to 
, eigenvectors [1 1]? and 
given that « = 4 and y = 0 when t = 0. [1 —1]", respectively. 


The above method works equally well for larger systems. For example, 
consider the following example of a system of three differential equations. 
Example 2.3 

Find the general solution of the system of differential equations 


g = 3a + 2y + 2z, 


y = 2r + 2y, 
z2=22 + 4z. 
Solution 


. The eigenvectors of A are These eigenvectors and 
eigenvalues were found in 
Unit 10, Example 3.1. 


The matrix of coefficients is A = 


No bw Ww 


2 
2 
0 


en) 


[2 1 2Q)7,f1 2 -2]? and[-2 2 1)", corresponding to the eigenvalues 
Ay = 6, Ag = 3 and A3 = 0, respectively. The general solution is therefore 


v 2 - 1 i “a Note that the last term on the 
y)=afllyer+6B)] 2le"+y] 2], right-hand side corresponds 
x 9 a) 1 to the term in es? = e% = 1, 


where a, 9 and y are arbitrary constants. I 


*Exercise 2.3 


A particle moves in three-dimensional space in such a way that its position The eigenvectors of 


(x,y,z) at any time t satisfies the simultaneous differential equations 5 0 0 
F 2 | are [2 1 1? 
c= 52, 1 1 2 
y= L+2y+ 2, (0 1 lj? and[o 1 -il’, 
Z= £+ yt2z. corresponding to eigenvalues 


5, 3 and 1, respectively. 
Find the position (2, y,z) at time t if (0) = 4, y(0) = 6 and z(0) = 0. n respectively 


The above method can be used to solve any system x = Ax of linear 
constant-coefficient first-order differential equations for which the matrix 
A has distinct real eigenvalues. We summarize the procedure as follows. 
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Procedure 2.1 _ Distinct real eigenvalues 


To solve a system of linear constant-coefficient first-order differential 
equations x = Ax, where A is an n X n matrix with distinct real eigen- 
values: 


(a) find the eigenvalues A1, A2,...,An and a corresponding set of eigen- 
vectors V1, V2,.--,;Vn;3 


(b) write down the general solution in the form 
x= Cine’ +0 +e + Ove, 


where Cy, Co,...,C, are arbitrary constants. 


In the next subsection we investigate what happens when the eigenvalues of 
the matrix A are not distinct, or when they are complex numbers. 


2.2 Two variations 


The above method relies on the fact that eigenvectors corresponding to dis- 
tinct eigenvalues are linearly independent, which ensures that the solution 
given in Procedure 2.1(b) contains n arbitrary constants and so is the re- 
quired general solution. If the eigenvalues are not distinct, then we may 
not be able to write down n linearly independent eigenvectors, so when we 
attempt to construct the solution for x given in Procedure 2.1(b), we shall 
find that it contains too few constants. In the following example we are 
able to find a sufficient number of eigenvectors in spite of the fact that an 
eigenvalue is repeated. 


Example 2.4 


Find the general solution of the system of differential equations 


C= 5x + 32, 
y= 384+ 2y4 32, 


z£=-62 — Az. 
Solution 
5 0 8 
The matrix of coefficients is A = 3 2 31]. Using the techniques of 
—6 0 -4 


Unit 10, we can calculate the eigenvalues of A. They are \ = —1 and A = 2 
(repeated), corresponding to eigenvectors [1 1 —2]" and [0 1. OJ’, re- 
spectively. 


You need to understand what goes wrong before we show you how to put it 
right, so suppose that we try to follow Procedure 2.1(b), and let 


x=Cifl 1 —-2)?e*+C,/0 1 O]7e%. 


This is certainly a solution, but it is not the general solution because it 
contains only two arbitrary constants and we require three. 


The answer to our difficulty lies in our method of calculating the eigenvectors 
when the eigenvalue is repeated. The eigenvector equations are 


(5 —Aj)x + a2 = 0, 
32 + (2—A)yt+ se=, 
—6z +(-4-A)z=0. 
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This result is not obvious; we 
ask you to take it on trust. 


Section 2 First-order homogeneous systems 


A =2)| The eigenvector equations become 
34+3z=0, 38x+3z=0 and —6r—-6z=0. 
All three equations give z = —x, but there is no restriction on y. It follows 


that any vector of the form [k 1 —k]’ is an eigenvector corresponding to 
\ = 2. (In particular, the vector [0 1 0]” mentioned above is an eigen- 
vector corresponding to A = 2.) 


Remember that we need to find three linearly independent eigenvectors, 
and it might appear that we have found only two, namely [1 1 —2]" 
corresponding to \ = —land[k 1 —k]” corresponding to \ = 2. However, 
the numbers k and | are arbitrary, so there are infinitely many vectors in the 
second category; our task is to choose two that are linearly independent. If 
we write 


[fk i —kJ?=k1 0 -1)? +7[0 1 Of, 


then we see at once that [1 0 -—1]’and[0 1. OJ” are suitable candidates. 
Thus 

x 1 0 

y|=Ci,| 1]e*+1Co| O| +03] 1] | e*, 

z —2 —1 0 


where C1, Co and C3 are arbitrary constants, is a solution, by Theorem 2.1 
and the principle of superposition. The fact that the three eigenvectors are 
linearly independent ensures that there are three arbitrary constants, and 
hence that this is the general solution. Hf 


*Exercise 2.4 


A particle moves in three-dimensional space in such a way that its position 
(x,y,z) at any time t satisfies the simultaneous differential equations 


Le 2; 
Y=Y, 
£= aH: 


Find the position (x,y, z) at time ¢ if x(0) = 7, y(0) = 5 and 2(0) = 1. 


In Example 2.4 we were able to determine the general solution even though 
there were only two distinct eigenvalues. This is because we were able to 
find three linearly independent eigenvectors. We now consider a situation 
in which there are too few linearly independent eigenvectors. 


Suppose that we try to apply the above method to find the general solution 
of the system of differential equations 


C=, 
yr=ury. 


The matrix of coefficients is 


1 0 
a-[i i) 


and the eigenvalues are \ = 1 (repeated). We now substitute this eigenvalue 
into the eigenvector equations 


(1-A)c¢ =0 
and 
g+(1—A)y=0. 


Here k and / are any real 
numbers, not both zero. 


The eigenvalues of 


0 0 1 
O 1 O}] areX=-1, 
1 0 0 


corresponding to an 


eigenvector [1 0 —1]", and 
A = 1 (repeated), 
corresponding to an 
eigenvector [k 1 kj’. 


Since A is a triangular 


matrix, the eigenvalues are 
simply the diagonal entries. 


113 


Unit 11 Systems of differential equations 


A=1]| The eigenvector equations become 0 = 0 and x = 0. The first equa- 
tion tells us nothing; the second equation gives x = 0, but imposes no restric- 
tion on y. It follows that [0 k]” is an eigenvector corresponding to \ = 1, 
for any non-zero value of k. Thus, choosing k = 1, we have the solution 


pleei]é 


But this is clearly not the general solution, since it has only one arbitrary 
constant instead of two; the procedure has failed. 


Before extending our matrix procedure to cover the above case, it is illu- 
minating to solve the given system of differential equations directly. The 
first equation, « = 2, has solution x = De‘, where D is an arbitrary con- 
stant. Substituting this solution into the second equation gives y = y + De’, 
which can be solved by the integrating factor method of Unit 2 to give 
y = Ce! + Dte’. Thus the solution for which we are searching is 


plee[t]e+2 [ie 


eigenvector notice the t here 
and this is the general solution since it contains two arbitrary constants. 


The above solution provides a clue to a general method. The origins of the 
first term on the right-hand side are clear — it corresponds to an eigenvector 
— but let us look more closely at the second term on the right-hand side. 
We have 


ofto=olsheofsfe=o((le- [ae 


eigenvector 


This suggests that if we wish to solve the system x = Ax for which A has a 
repeated eigenvalue A, giving rise to too few eigenvectors, it may be helpful 
to search for solutions of the form 


x = (vt+b)e™, 
where v is an eigenvector corresponding to A, and b is to be determined. 


Let us suppose that our system has a solution of this form, and examine the 
consequences. Substituting this proposed solution into x = Ax gives 


A(vt + b)e*! + ve’ = A(vt + b)e™, 
or, on division by e* and rearranging, 

(Av — Av)t + (Ab — Xb) =v. 
The first term disappears, because v is an eigenvector of A corresponding 
to the eigenvalue \ (so Av = Av), and we are left with (A — AI)b =v. If 


this equation can be solved for b, then we shall have found a solution of the 
required form. 


For example, in the above case we had A = E | and A = 1, so 
0 0 0 
A-aT=A-1=|{ | and v= [1]. 


Thus, if b= [b; ba], the equation (A — AI)b = v becomes 
0 0] fh] _ fo 
1 Of Jbo] J 1d’ 
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There are two arbitrary 
constants since [0 1]7 and 
[1 ¢]” are linearly 
independent. 


Section 2 


Any solution of this equation for b; and b2 is acceptable. In fact, this 
equation gives just b; = 1 (with no condition on bz), and we can take b to 


be [1 0)”, giving the solution (8 t+ a) = H et, 


Now we have two linearly independent solutions of our system of differential 
equations, namely [0 1}’e! and [1 #]"e!, so the general solution is 


Hb = 0 t 1 t 

A =o|tle a BE 
SS Sa 
solution solution 


vert (vt-+b)e** 


as we found before. 


Procedure 2.2. Repeated real eigenvalues 


To solve a system of linear constant-coefficient first-order differential 
equations x = Ax, where A is an n x n matrix with some repeated 
real eigenvalues (where any eigenvalue is repeated at most once), do 
the following. 


(a) For the non-repeated eigenvalues \1, A2,..., Ax and corresponding 
eigenvectors V1, V2,...,V,%, write down the set S' of solutions of the 
form v;e*". 


(b) Examine the eigenvector equations corresponding to each repeated 
eigenvalue, and attempt to construct two linearly independent 
eigenvectors to add two solutions to S. 


If this fails, then for each repeated eigenvalue »; that gives rise 
to only one eigenvector v;, construct a solution (v;t + b;)e*! for 
which (A — \,I)b; = v;, and add it and the solution v,;e*" to the 
set S. 

(c) If S contains n linearly independent solutions, then the general 


solution of the system of differential equations is an arbitrary linear 
combination of the solutions in S. 


*Exercise 2.5 


A particle moves in the plane in such a way that its position (x,y) at any 
time ¢ satisfies the simultaneous differential equations 


{ & = 2x + 3y, 

y= 2y. 

Find the position (x,y) of the particle at time t if its position is (4,3) at 
time t = 0. 


So far, all our examples and exercises have involved real eigenvalues. We 
now investigate what happens when the characteristic equation has at least 
one complex root (giving a complex eigenvalue). In fact, since the argu- 
ments leading to Procedure 2.1 did not rely on the eigenvalues being real, 
Procedure 2.1 also applies whenever the eigenvalues are distinct — it does 
not matter whether they are real or complex. However, using Procedure 2.1 
for complex eigenvalues leads to a complex-valued solution involving com- 
plex arbitrary constants. For a system of differential equations with real 
coefficients, we would generally want real-valued solutions. So here we see 
how to adapt Procedure 2.1 to obtain a real-valued solution when some of 
the eigenvalues are complex. 


First-order homogeneous systems 


The value of bz is irrelevant, 
since it is absorbed into the 
arbitrary constant C. 


This procedure covers only 
the case where an eigenvalue 
is repeated once. It could be 
extended to cover the case 
where an eigenvalue is 
repeated several times, but 
we choose not to generalize. 


This arbitrary linear 
combination will contain n 
arbitrary constants. 


For matrices A which are real 
(ours always are), complex 
roots occur as complex 
conjugate pairs a + bi. 
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We begin with an example which is simple enough for us to be able to apply 
a direct method in order to find the general solution. We then solve the 
system again by applying a matrix method based on Procedure 2.1, which 
has the advantage that it can be extended to larger systems of equations. 


Example 2.5 


Solve the system of differential equations 


L=Y, 
y= a. 


Solution 


(2.8) 


If we differentiate the first equation, giving * = y, and substitute for 7 
in the second equation, we obtain * = —x, so %+2a2=0. This second- 
order differential equation has auxiliary equation A? + 1 = 0, so the general 
solution is = Ccost+ Dsint. Therefore y = « = —C'sint + Dcost. Thus 
the general solution of Equations (2.8) is 


FA =c| eres pore 
Yy —sint cos t 
Now we shall obtain the same general solution using a matrix method based 


on Procedure 2.1. 


The matrix of coefficients is 


0 1 
ve E | | 
The characteristic equation is \? + 1 = 0, giving the complex eigenvalues 
A =iand A= —i. The eigenvector equations are 


—A\r+y=0 and 


The eigenvector equations become 


—ix+y=0 and 


—x— y= 0. 


—x— iy =0, 
which reduce to the single equation y = iz. It follows that [1 iJ” 
eigenvector corresponding to A = 7. 


is an 


A =-i| The eigenvector equations become 


ix+ty=0 and —x+iy=0, 


which reduce to the single equation y = —ix. It follows that [1 —i]” is an 


eigenvector corresponding to A = —12. 


Now since Procedure 2.1 works for complex as well as real eigenvalues, the 
general solution of the given system of differential equations can be written 


as 
y a —i 


where C and D are arbitrary complex constants. 


If we are interested only in real-valued solutions, then we need to rewrite 
the above solution in such a way as to eliminate the terms involving 7. We 
do this by using Euler’s formula, which gives 


it 


e’ =cost+isint and e 


= cost —zsint 


(replacing t by —t and using cos(—t) = cost and sin(—t) = —sint). 
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See Unit 3. 


Note that if we want real 
solutions, then C' and D must 
be real. 


Note that the first equation is 
i times the second equation. 


Here, the second equation is 
i times the first equation. 


Note that the \ and v occur 
in complex conjugate pairs. 
(The complex conjugate of a 
vector v is the vector V whose 
elements are the complex 
conjugates of the respective 
elements of v.) 


You saw Euler’s formula in 
Unit 1. It was used in a 
similar way in Unit 3. 


Section 2 First-order homogeneous systems 


Then 
x = C(cost + isint) + D(cost — isin t) 
= (C+ D) cost + (Ci — Di)sint, 
y = Ci(cost + isin t) — Di(cost — isin t) 
= (Ci — Di) cost — (C+ D)sint. 


Writing a= C+ D and § = Ci — Di, we have Since C and D are arbitrary 
. ; complex constants, so are @ 
x=acost+sint and y= (cost — asint. and 3. 


Thus we can write the general solution of Equations (2.8) as 


y cost sint 
H — oe +8| on j 


where a and @ are arbitrary complex constants. For real-valued solutions 
we must use only real values for a and 7. 


In Example 2.5, notice that 


Ll) og cost+isint} _ cost a sint 
i|© ~]-sint+icost| ~ | —sint cost |’ 
so 


m((e)-L2) = =()) [2] 
a —sint ) cos t 
and these are the expressions that appear in our final solution. 


In general, complex eigenvalues of a real matrix A occur in complex con- 
jugate pairs A and A, with corresponding complex conjugate eigenvectors 
v and Vv. These give rise to two complex solutions, ve’ and ve’, which 


contribute the terms 
Ove + Dve™ 
to the general solution (where C and D are arbitrary complex constants). 


To obtain a real-valued solution, this expression can be rewritten in the form 


oRe(ve*) + BIm(ve™), The components of Re(ve*) 


and Im(ve*) are sinusoidal 
where a@ and @ are arbitrary real constants. functions. 
Procedure 2.3 Complex eigenvalues 
To obtain a real-valued solution of a system of linear constant-coefficient 
first-order differential equations x = Ax, where A is an n x n matrix 
with distinct eigenvalues, some of which are complex (occurring in com- If any of the real eigenvalues 
plex conjugate pairs A and A, with corresponding complex conjugate are repeated, this procedure 


will need to be adapted as in 


eigenvectors v and V), do the following. Proce 55 


(a) Find the eigenvalues 1, A2,..., An and a corresponding set of eigen- 
vectors V1, V2,---,Vn- 


(b) Write down the general solution in the form 
x = Civye’™ + Cover 4c. + Crvner”*, 
where Cy, Co,...,C, are arbitrary complex constants. 


(c) Replace the terms ve*’ and ve™ appearing in the general solution 
with Re(ve™) and Im(ve’). 


The general solution will then be real-valued for real C1, C2,...,Ch.- 
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Example 2.6 
Solve the system of differential equations 
{ L=3r-Y, 
y= 2a+y, 
given that « = 3 and y= 1 when t= 0. 


Solution 


The matrix of coefficients is 
3 -l 
re E il | 
The eigenvalues of A are \=2+i and \ = 2 —i, corresponding to eigen- 


vectors v=[1 1-7 and V=[1 1+4%]", respectively. So the general 
solution can be written as 


x=Cvetspvet-cl|i | iti p|! | ea 
1-4 L+¢ ; 
where C and D are arbitrary complex constants. 


To obtain a real-valued solution, we follow Procedure 2.3 and write 


vert = E / e(2tie 
1-1 


1 i 
_ 2t it 
= ale 


1 
= “he | (cost + isin t) 
cost +isint 
(1 — 2)(cost + isin t) 
_ cost +7sint 
(cost + sin t) + i(sint — cos t) 
cos t . of | Sint 
“fmt ante are 
oe part imaginary part 


The real-valued general solution of the given system of equations is therefore 


XL] 94 | cost o4 | sint 
H —— ae oe kame Ce 


where a and £ are arbitrary real constants. 


In order to find the required particular solution, we substitute x = 3, y = 1 
and t = 0 into Equation (2.9), to obtain 


ft} =eL:] +¢[ 3], 


so3=aand1=a-— @, giving G = 2, and the solution is therefore 


eG e**(3 cost + 2sint) 
> 2t ‘ a 
y e*'(cost + 5sint) 


We shall ask you to solve systems of linear constant-coefficient first-order 
differential equations by hand only for 2 x 2 and 3 x 3 coefficient matrices, 
but you may well encounter larger systems when you use the computer 
algebra package for the course. 
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Section 2 


End-of-section Exercises 


*Exercise 2.6 


Using the eigenvalues and eigenvectors given, find the real-valued solution of 
each of the following systems of differential equations, given that « = y = 1 
when ¢t = 0. 


&=2x+ 3y 
(a) oo y 


(The matrix E | has eigenvalues 4 and —1, corresponding to eigen- 


vectors [3 2]? and [1 —1]", respectively.) 


% = —3x — 2y 
on eee 


(The matrix - ml has eigenvalues —1 + 2i and —1 — 22, corre- 
sponding to eigenvectors [1 —1 —i]? and [1 —1+4%]", respectively.) 


Exercise 2.7 

Write down the general solution of each of the following systems of equations. 
r= 2 — 2 

(a) (y= T+2y+ z 
Z=2e+ 2y4 32 


1 0 -l 

(The matrix |1 2 11] has eigenvalues 1, 2 and 3, corresponding 
22 3 

to eigenvectors [1 —1 O]7,[-2 1 2]? and [1 -—1 —2]?, respec- 


tively.) 


z= ba — by — 6z 
(b) <4 9 =—e-+4y-+ 22 
z= 3x — by — 4z 


5 —-6 —6 
(The matrix |—-1 4 2] has eigenvalues 1 and 2 (repeated), with 
3-6 —4 


corresponding eigenvectors [3 —1 3]? and [21+2k 1 kj", respec- 
tively, where k and / are arbitrary real numbers, not both zero.) 


Exercise 2.8 


Find the general real-valued solution of the following system of equations. 


GS a + z 

y= y 

zZ=-x“4+y 
Find the solution for which « = y = 1 and z = 2 when t= 0. 

101 : 
(The matrix O 1 O| has eigenvalues 1, \ = - + V3 ij and \ = $ — v3y, 
-1 1 0 

corresponding to the eigenvectors [1 1 0]7,v=[1 0 -—5+ V3iIT and 
v=([1 0 -3- VBAT, respectively.) 


First-order homogeneous systems 


119 


Unit 11 Systems of differential equations 
3 First-order inhomogeneous systems 


In the previous section you saw how to solve a system of differential equations 
of the form x = Ax, where A is a given constant-coefficient matrix. We now 
extend our discussion to systems of the form 
x= Ax+ h(t), Here we write h(t) to 


. : . : . ; emphasize that h is a 
where h(t) is a given function of t. Our method involves finding a ‘particular  fynction of ¢. Henceforth we 


integral’ for the system, and mirrors the approach we took for inhomoge- _ shall abbreviate this to h. 
neous second-order differential equations in Unit 8. 


3.1 A basic result 


In Unit 3 we discussed inhomogeneous differential equations such as 
dy 
dx? 


To solve such an equation, we proceed as follows. 


+ 9y = 2e* + 182 + 18. (3.1) See Unit 3, Example 2.8. 


(a) We first find the complementary function of the corresponding homoge- 
neous equation 


&y 


dx? 


which is 


+ 9y = 0, 


Ye = C1 cos 3x + Co sin 32, 
where C, and C2 are arbitrary constants. 
(b) We then find a particular integral of the inhomogeneous equation (3.1), 
Yp = ze" + 2a + 2. 


The general solution y of the original equation is then obtained by adding 
these two functions (using the principle of superposition) to give 


Y = Yet Yp = (Ci cos 3x + Co sin 3x) + (ger + 2a + 2) ; (3:2) 


A similar situation holds for systems of linear first-order differential equa- 

tions. For example, in order to find the general solution of the inhomoge- 

neous system 
aes 


y= 2+4y- er. (3.3) 


which in matrix form becomes 


x 3 2) /a 4ert This is x = Ax +h, where 
A)-E ae)*C2 pers 
we first find the general solution of the corresponding homogeneous system 
{ & = 32 + 2y, 
y= x2+Ay, 


which is the complementary function 


elflewt ala 


where a and ( are arbitrary constants (see Example 2.1(a)). 
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We next find a particular solution, or particular integral, of the original 
system (3.3), namely 


l= La] 


as we shall show in Subsection 3.2. The general solution of the original 
system (3.3) is then obtained by adding these two: 


el-[el+ [ell] +e[ a] [a] 


To describe the above expression as ‘the general solution’ is perhaps pre- 
mature, because it is not immediately obvious that it is even a solution. In 
order to establish that this is the case, we may use the following general 
result. 


Theorem 3.1 


If x, is a solution of the system x = Ax +h, and xg is a solution of 
the system x = Ax + hg, then px; + qgx2 is a solution of the system 
x = Ax + ph; + qgho, where p and q are constants. 


Principle of superposition 


This result is easy to prove, for we have 


d : ; 
ay Px + gX2) = px1 + gX2 

= p(Ax1 + hy) + q(Ax2 + ho) 

= A(pxi + qX2) + phi + qh. 
The particular case that is relevant here corresponds to choosing h, = 0 and 
hz = h, say, and putting p= q=1, x1 = X<, the complementary function, 
and X2 = Xp, a particular integral. Then the above result gives rise to the 
following theorem. 


Theorem 3.2 


If x. is the complementary function of the homogeneous system 
x = Ax, and x, is a particular integral of the system x = Ax +h, 
then x, + Xp is the general solution of the system x = Ax +h. 


*Exercise 3.1 
Write down the general solution of the system 


r=3x+2y+ Ft, 
y= x+4y+ 7, 


given that a particular integral is 


tp =tt §, Yp = —2t — Zh. 


At this stage it is natural to ask how we were able to find the above particular 
integral (although it is easy to verify that it is a solution to the given system, 
by direct substitution). Before we address that question in detail, we should 
emphasize the importance of the principle of superposition. Consider the 
following example, paying particular attention to the form of h, which is 
made up of both exponential and linear terms. 


We use the term particular 
integral rather than 
particular solution. The latter 
is more appropriately used for 
the solution to system (3.3) 
that satisfies given initial or 
boundary conditions. 


The matrix Ei ‘| has 


eigenvalues 5 and 2, 
corresponding to eigenvectors 
[1 1]? and[-2 1]7, 
respectively. 
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Example 3.1 
Find the general solution of the system 


re ae 2t, Here = | 460 7} 


‘ 3t 
Solution 
Choosing h; = [4e** —e?"]7, we see from the above discussion that x; = See Equations (3.3) and 


[3 —2]7e** is a solution of x = Ax+hy;. Also, choosing hg = [t 7#]? Exercise 3.1. 
gives xo = [t+ - 2t il” as a solution of x = Ax+h». Thus, from 
the principle of superposition, x, + 2x2 is a particular integral of the given 
system written as x = Ax +h; + 2h. Hence, by Theorem 3.2, using the 


complementary function found above, the general solution is 
cl 1| ose —2) 28 3] 3¢ 
l=(4lsf¢ +5 | ile )+([2 er + . i 


Example 3.1 illustrates a general technique, the principle of which is to break 
down the term h into a sum of manageable components. 


2t + 
_—At — 


OouN Ouloo 


3.2 Finding particular integrals 


We now show you how to find a particular integral x, in some special cases. 
We consider the system x = Ax +h in the situations where h is a vector 
whose components are: 

(a) polynomial functions; 

(b) exponential functions; 

(c) sinusoidal functions. 

Our treatment will be similar to that of Unit 3, where we found particular 
integrals for linear second-order differential equations using the method of 
undetermined coefficients. As in that unit, a number of exceptional cases 
arise, where our methods need to be slightly modified. 


To illustrate the ideas involved, we consider the system 


x= Ax+h, where A= |} ale (3.4) 


The first stage in solving any inhomogeneous system is to find the comple- 
mentary function, that is, the solution of the system x = Ax, which as you 
saw in Subsection 3.1 is 


ia =alile+a[ i] (3.5) 


To this complementary function we add a particular integral that depends 
on the form of h. We now look at examples of the above three forms for h, 
and derive a particular integral in each case. 


Example 3.2 


Find the general solution of the system 


r=3r+2y4+ ft, 
y= «+4y+ 7. 


Here h=[t 7t]", so his 
linear. 
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Solution 
The complementary function is given in Equation (3.5). 


We note that h consists entirely of linear functions, so it seems natural to 
seek a particular integral of the form 


x) |at+b 
y|  |et+d|?’ 
where a, b, c and d are constants that we need to determine. We find them 


by substituting 7 = at+ 6, y = ct + d into the above system. This gives the 
simultaneous equations 


a = 3(at + b) + 2(ct +d) +¢, 
c= (at+b)+4(ct +d) + Tt, 
which give, on rearranging, 
(3a + 2c + 1)t + (30 + 2d — a) = 0, 
(a+4ce+7)t+(b+4d—c) =0. 
Equating the coefficients of t to zero in Equations (3.6) gives 
3a+2c+1=0, 
a+4c+7=0, 
which have the solution 
a=1, c==-2. 
Equating the constant terms to zero in Equations (3.6), and putting a = 1, 


c = —2, gives the equations 
{ 30+ 2d—1=0, 


6+4d+2=0, 
which have the solution 
a4 _ 7 
b= 5, d=—7.- 


Thus the required particular integral is 


4 
Beles 
Yp 2-5]? 


and the general solution is 


ol = [il + (3 
=ali]et+a[ |e 


*Exercise 3.2 


Find the general solution of the system 


ge=a2+4y— ¢42, 
y=u—2y+ dt. 


You may have been tempted 
to use a simpler trial solution, 
of the form 


xv} jat 

y| [etl 
Unfortunately, this does not 
work — try it and see! You 


may recall something similar 
in Unit 3. 


These equations hold for all 
values of t, which means that 
each of the bracketed terms 
must be zero. 


For the complementary 
function, see Example 2.2. 
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Example 3.3 
Find the general solution of the system 


& = 3x + 2y + 4e*, 
y= ct4y— e*. 


Solution 


The complementary function is given in Equation (3.5). We note that h 
consists entirely of exponentials, so it seems natural to seek a particular 


integral of the form 


fo] [ex] =[s] 


where a and b are constants that we need to determine. 


We find them 


by substituting x = ae*’, y = be* into the above system. This gives the 


simultaneous equations 


3ae* = 3ae** + 2be*! + 4e%, 
3be** = ae* + 4be** — e%F, 


or, on dividing by e*, 


3a = 3a4+ 264+ 4, 
3b= at+4b—-1. 


Rearranging these equations gives 


2b = —4, 
a+ b= 1, 
which have the solution 
a=3, b=-2. 


Thus the required particular integral is 


Lp} _ 3e%# 
Yo | —2et 


Exercise 3.3 
Find the general solution of the system 


g=x2+4y+4et, 
y=a2—2y+5e. 


Example 3.4 
Find the general solution of the system 


z= 3x + 2y — 2sin 30, 
y= x+4y— 3sin3t — 7cos3t. 
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Here h = [4e** 
so h is exponential. 


—e3t]T 
| 


This method fails when h 
involves e*! or e?', which 
occur in the complementary 
function. We deal with such 
examples in the ‘Exceptional 
cases’ subsection below. You 
may recall similar failures in 
Unit 8. 


The complementary function 
is the same as that of 
Exercise 3.2. 


—2 sin 3t 
~ | —3sin 3t — 7cos3t |’ 


so h is sinusoidal. 


Section 3 First-order inhomogeneous systems 


Solution 


The complementary function is given in Equation (3.5). We note that h 
consists entirely of sine and cosine terms, so it seems natural to seek a 
particular integral of the form 


x} — | asin3t+ bcos 3t 
y| | csin3t+dcos3t | ’ 
where a, b, c and d are constants that we need to determine. We find them 
by substituting « = asin 3t+ bcos3t, y = csin3t+ dcos3t into the above 
system. This gives the simultaneous equations 
(2 — 3a — 3b — 2c) sin 3t + (3a — 3b — 2d) cos 3t = 0, 
(3 — a — 4c — 3d) sin 3t + (7 — 6+ 3c — 4d) cos3t = 0. 
These equations hold for all values of t, so each of the bracketed terms must 
be zero, which gives the four simultaneous equations 


3a + 3b + 2c = 2. 

3a — 3b — 2d=0, 

as + 4e + 8d = 3, (3.7) 
b — 3c + 4d = 7. 


Using the Gaussian elimination method, as in Unit 9, we can solve these 


: . ee eee _ 112 7 _ 300 
equations to give a = 557, b 551? x1? d 1° 


Thus the required particular integral is 


Lp} 1 211 sin 3t + 11 cos 3¢ 
Yp| 221 | —112sin 3¢ + 300cos 3¢ | ’ 


and the general solution is 
Ble eaed 
y Uc Up 


— 1 5t —2 ot Jt. 211 sin 3t + 11 COs 3t 
=aile +8] Je + 221 | -112sin3¢-+300cos3¢|° ™ 


*Exercise 3.4 
Find the general solution of the system 


&= 2+ 4y —cos2t — 4sin 2t, 
y=xu—2y+sin 20. 


Procedure 3.1 ‘Particular integral 


To find a particular integral x, = [xp Ypl? for the system x = Ax+h, 
do the following. 


(a) When the elements of h are polynomials of degree less than or 
equal to k, choose zp and yp to be polynomials of degree k. 


(b) When the elements of h are multiples of the same exponential func- 
tion, choose xp and yp to be multiples of this exponential function. 


(c) When the elements of h are linear combinations of the sinusoidal 
functions sin wt and cos wt, choose zp and yp to be linear combina- 
tions of these functions. 


To determine the coefficients in xp and yp, substitute into the system 
of differential equations and equate coefficients. 


Note that we need both sin 3t 
and cos 3t in each component 
of the particular integral. 


We omit the details, but you 
can easily verify that this is 
correct by substituting these 
values for a, b, c and d into 
Equations (3.7). 


The complementary function 
is the same as that of 
Exercises 3.2 and 3.3. 


Note that we need both sin wt 
and cos wt in each component 
of the particular integral. 
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Exceptional cases 


In Unit 3 we discussed the differential equation 


d2 

7 — dy = 2e”, 
The complementary function is y. = ae~?” + Ge?*, where a and £ are ar- 
bitrary constants. For the particular integral, it would be natural to try 
y = ke?*, where k is a constant to be determined. However, as you saw in 
Unit 3, this fails since e?* is already included in the complementary func- 
tion. Instead, we insert an extra factor x, and try a particular integral of 
the form y = kae?*. 


A similar situation holds for systems of linear differential equations in cases 
where the usual trial solution is part of the complementary function, as we 
show in the following example. 


Example 3.5 
Find the general solution of the system 


oa 


Y= ot+4yt+3e. (3.8) 


Solution 


The complementary function is 


elflewt aa 


where a and @ are arbitrary constants. Since the complementary function 
includes an e”! term, a particular integral of the form 


[ees] = [2] 
= ot => € 
y ae a2 
will not work. For, if we substitute this into the system of equations, we 
obtain 

2a e"" = 3a ,e7" + 2age** + 6e7!, 

Qage7* = aye”* + dane + 3e”*, 
or, on dividing by e? and rearranging, 

a, + 2a2 = —6, 

ay + 2a2 = —3. 
These equations clearly have no solution, so the method fails. 
However, there is a method that will succeed. First write v; = [1 1)", vo = 
[-2 1]7, \, =5, A» = 2, so that v1, v2 are eigenvectors corresponding re- 
spectively to the eigenvalues 1, 2, and write h = ke??', wherek = [6 3)”. 
Now k can be written as a linear combination of the linearly independent 
eigenvectors v1 and vo: 

k= pvit+ qv2. (3.9) 

In this case, [6 3)? = p[1 1)" +q[-2 1]7,sop=4 and q=-—1. Next we 
look for a trial solution of the form 


x = (av, + bvgt)e**. (3.10) 


Substituting this and h = ke?" = pve?! + qvoe>?! into x = Ax +h gives 


bvoe2t + A2(avi + bvat)er! = A(avi + bvat)e>?! + pve" + qvo0e". 
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See Unit 3, Example 2.6. 


Here h = [6e7! 374)". 


See Equation (3.5). 


Any n-dimensional vector v 
can be written as a linear 
combination of n linearly 
independent n-dimensional 
vectors. 


Section 3 First-order inhomogeneous systems 


Dividing by et and using the fact that Avy; = Ajv1 and Avg = Aove, we 
obtain 


a(A2 — A1)v1 + bv2 = pvi + qve. 
Equating the coefficients of v1 and v2 gives 
a= p/(A2 — 1) 


We know A, = 5, Ag 4 and q=-—1,soa=-—s andb=-—l, anda 
particular integral is 


CU [ade- [Ef 


So the general solution of Equations (3.8) is 


cofflevol fe 


and b=q. 


2, p 


a 
] ec Ef 
ap=5 


Procedure 3.2. Particular integral — special case 


To find a particular integral x, = [xp Yl? for the system x = Ax+h, 
where A has distinct real eigenvalues A, and 2, with corresponding 
eigenvectors v; and vg, respectively, and h = ke?2*, first determine p 
and q such that 


k = pv, + qvo. 
Then a particular integral has the form 
Xp = (avit+ bvat)er?!, 


where a = p/(A2 — A1) and b=4q. 


A similar procedure can be followed if h contains polynomial or sinusoidal 
components, but we do not go into the details here. 


End-of-section Exercises 


Exercise 3.5 
Solve the system of differential equations 


&=2e+3y+ e%, 
g=2e+ yt 4e, 


subject to the initial conditions x(0) = 2, y(0) = 2. 


Exercise 3.6 
Find the general solution of the system of differential equations 
£=2e+3yt+ e*, 
y= 2e+ yt2e*. 
*Exercise 3.7 


Find the general solution of the system of differential equations 


c= 20+ 3y +t, 
y=2x+ ytsint. 


(Hint: Use the principle of superposition.) 


Equating coefficients is 

appropriate as v; and v2 are 
linearly independent. Also, a 
is well defined since Ay # Xo. 


The matrix E | has 


eigenvectors [1 —1]? and 
[3 2], corresponding to 
eigenvalues —1 and 4, 
respectively. 


See the margin note for 
Exercise 3.5. 


See the margin note for 
Exercise 3.5. 
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4 Second-order systems 


In this section we show how the methods introduced earlier in this unit 
can be adapted to finding the solutions of certain systems of homogeneous 
second-order differential equations. We then consider a particular case in 
which the solutions are all sinusoidal. Such cases arise often in connection 
with oscillating mechanical systems or electrical circuits. 


4.1 Homogeneous second-order systems 


We now turn our attention to systems of linear constant-coefficient second- 
order differential equations of the form x = Ax. You have seen that the 
general solution of a first-order system x = Ax of n equations can be writ- 
ten as a linear combination of n linearly independent solutions involving 
n arbitrary constants. In a similar way, it can be shown that the general 
solution of a second-order system x = Ax of n equations can be written as 
a linear combination of 2n linearly independent solutions involving 2n ar- 
bitrary constants. The following example will show you what is involved in 
the solution of such systems; the treatment is similar to that of Example 2.1. 


Example 4.1 
Find the general solution of the system of differential equations 
Z=2+4y, 
oo oy) 
Solution 


In order to find the general solution of this pair of equations, it is sufficient to 
find four linearly independent solutions and write down an arbitrary linear 
combination of them. 


So we begin by attempting to find solutions of the form 
z=Ce", y= De, 
where C and D are constants. 
Since % = Cy?e and 7 = Dye", we have, on substituting the expressions 
for x and y into Equations (4.1), 
Oe = Ce” + 4De", 
Dy?et = Ce — 2De. 


Cancelling the e“* terms, we obtain 


jt 2] |p) =>) 


so py? is an eigenvalue of A = E 4 


1 -2 
know that the eigenvalues of A are 2 and —3, so p = +V2 and p = +V3i. 


I However, from Example 2.2 we 


The eigenvalue 2 corresponds to an eigenvector [4 1]”, and it is easy to 
verify that the values y = +V2 provide us with two linearly independent 
solutions of Equations (4.1), namely 


xX, = i] ev and xX) = i] eo Vt, 
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We shall not prove this. 


The reason for using ju 
instead of will become 
apparent as we proceed. 


The verification can be done 
by direct substitution. 


Section 4 Second-order systems 


The eigenvalue —3 corresponds to an eigenvector [1 —1]", and choosing 
L= J/3i gives a further solution Here we use Euler’s formula 
to write down the exponential 
| ql eV ait = | il (cos V3t + isin V3t); in terms of sinusoids. 
we can verify that the real and imaginary parts of the expression on the 
right-hand side are both solutions of Equations (4.1). Thus we have found 
two more linearly independent solutions of Equations (4.1), namely Choosing ps = —V/3i leads to 
the same two linearly 
x3 = | | cos V3t and x4= i sin V3t. independent solutions. 


Using a version of the principle of superposition applicable to second-order We do not prove this. 
systems, we can now take linear combinations of x,, x2, x3 and x, to find 
further solutions. The expression 


x = Clix] + C2x2 + C3x3 + C'4X4, 


where C1, C2, C3, C4 are arbitrary constants, is the general solution of Equa- Note that there are four 
tions (4.1). I arbitrary constants, as 
expected. 


Comparing the above solution with that of Example 2.2, we notice many 
similarities. The main difference is that \ is replaced by pi?, giving rise to 
four values for jz, instead of two for A. Consequently, we obtain a general 
solution with four arbitrary constants. 


In general, consider a system of differential equations of the form x = Ax. If This discussion mirrors the 
we try an exponential solution of the form x = ve“’, where v is a constant corresponding discussion in 
column vector, then X = ve", and the system of differential equations Section 2 (page 110). 
becomes vp2e"" = Ave". Dividing this equation by e“’ and rearranging, 

we have 


Av = pi’v. 


Thus v is an eigenvector of A, and ji? is the corresponding eigenvalue. 


Theorem 4.1 
If pz? is an eigenvalue of the matrix A corresponding to an eigenvec- If x = ve", then x = pve"® 
tor v, then x = ve" is a solution of the system of differential equations and 
i % = p’velt = Ave = Ax. 
Example 4.2 
Find the general solution of the system of differential equations 
x= 3x + 2y, 
y= x2+Ay. 
Solution 
; : 2 : 
The matrix of coefficients is A = ki 1 . The eigenvectors of A are [1 1)” 
and [-2 1], corresponding to the eigenvalues \ = 5 and \ = 2, respec- 
tively. 


Using the notation of Example 4.1, it follows that ys has the values V/5, —/5, 
J/2 and — V2, and that the general solution is 


s]-olt}ersallesaltomof ae. « 
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*Exercise 4.1 


Find the general solution of the system of differential equations 


Z = 5x + 2y, 
Yy = 2a + dy. 


The above ideas can be formalized in the following procedure. 


Procedure 4.1 Second-order homogeneous linear systems 


To solve a system x = Ax, where A is an n X n matrix with n distinct 
real eigenvalues, do the following. 


(a) Find the eigenvalues )j, \2,.. 
of eigenvectors V1, V2,.--,Vn- 


.,An of A, and a corresponding set 


(b) Each positive eigenvalue, 2 say, corresponding to an eigenvector v, 


gives rise to two linearly independent solutions 
ve and ve. 


Each negative eigenvalue, —w? say, corresponding to an eigenvec- 
tor v, gives rise to two linearly independent solutions 


veoswt and vsinut. 


A zero eigenvalue corresponding to an eigenvector v gives rise to 
two linearly independent solutions 


v and vt. 


The general solution is then an arbitrary linear combination of the 
2n linearly independent solutions found in step (b), involving 2n 
arbitrary constants. 


We illustrate this procedure in the following example. 


Example 4.3 
Find the general solution of the following system of differential equations. 


= 3x + By 4+ 2z 


gj =2¢ + 2y 
z= 220 + 4z 
Soluti 
olution 39 9 
The matrix of coefficients is A= |2 2 0 The eigenvectors of A are 
2 0 4 


[2 1 2)", corresponding to the eigenvalue \ = 6, [1 2 —2]", corre- 
sponding to the eigenvalue \ = 3, and [—-2 2 1], corresponding to the 
eigenvalue \ = 0. It follows from Procedure 4.1 that the general solution of 
the above system is 


a 2 1 
y|= {1 (Cyev* + Coe V*) + 2 (C3e¥%# + Cye Vt) 
z 2 —2 
—2 
a 2|(C5+Cet). & 
1 
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The matrix E | has 
eigenvectors [1 1]? and 


[1 —1]", corresponding to 
eigenvalues 7 and 3, 
respectively. 


Complex eigenvalues and 
repeated real eigenvalues are 
not discussed here, but they 
can be dealt with in a fashion 
similar to that for the 
first-order case. 


We do not show this here, but 
you can verify it in any 
particular case (see 

Example 4.3 below). It is 
analogous to the case of a 
single second-order 
differential equation with 
both roots of the auxiliary 
equation equal to zero. 


You may like to verify that 
[-2 2 1]? and 

[-2 2 1]'t are both 
solutions of the system. 


Exercise 4.2 
Find the general solution of the system of differential equations 


E=2WW+ y- 2B, 
= -3y +2, 
2= Az. 


4.2 Simple harmonic motion 


Simple harmonic motion is an often observed phenomenon. It arises, for 
example, if a quantity satisfies a second-order differential equation of the 
form 

LS iso: 
where w is a constant. In this case solutions are of the form 

x =acoswt + Gsinwt, 
where a and @ are arbitrary constants. 


In Subsection 1.3 we developed a model for the horizontal motion of a ball- 
bearing within a bowl of specified shape. The resulting equations were 


Z= —5zr + Ay, 


where x(t) and y(t) are the horizontal coordinates of the ball-bearing at 
time t. These second-order differential equations may be expressed in matrix 
form as 


GZ} |—5 4|\a 
yl) | 4 —-d5] Jy]? 
. |-5 4 . 1 1 : 
The matrix 45 has eigenvectors 1 and “1 , corresponding to 


eigenvalues —1 and —9, respectively. 


It follows from Procedure 4.1 that the general solution is 


> 


where C1, Co, C3 and Cy are arbitrary constants. 


H (Ci cost + C2 sint) + |_| (C3 cos 3t + C4 sin 3t), 


Let us now consider the paths that a ball-bearing takes for given initial 
conditions. The following illustrates just four of the many possibilities. 


(a) x(0)=1, y0)= 1, #0)= 0, y(0)= 

(b) x(0)=1, y(0)=—1, #(0)= 0, y(0)=0. 
(c) x(0)=1, y(0)= 0, &(0)= 0, y(0)=0. 
(dq) x)=1, y0)= 2, #(0)=—-1, y@)=1 


In case (a) we find C, = 1 and Cy = C3 = Ci = 0, so the solutionis [xy] = 
[1 1]’ cost, and the ball-bearing performs simple harmonic motion in the 
direction of the vector [1 1)’, that is, along the line y = x, with angular 
frequency 1 (as shown in Figure 4.1). 


Section 4 Second-order systems 


2 1 -1 
The matrix ° —3 | 
0 0 4 


has eigenvalues 2, —3 and 4, 
corresponding to eigenvectors 
[1 oO O]?,[1 —5 Ol? and 
[-5 4 14)’, respectively. 
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In case (b) we find C3 = 1 and Cy = Cp = Cy = 0, so the solutionis [xy]? = 
[1 -—1]" cos3t, and the ball-bearing performs simple harmonic motion in 
the direction of the vector [1 —1]", that is, along the line y = —a, with 
angular frequency 3 (as shown in Figure 4.2). 


Figure 4.1 Figure 4.2 


In case (c) we find C) = C3 = 5 and Cy = Cy = 0, so the solution is [x y]” = 


(1 1]? cost+ [1 —1]7 cos 3t, which is a combination of the previous mo- 
tions (as shown in Figure 4.3). 


In case (d) we find Cy = 3, C3 = —5, Cy =0 and Cy = —4, so the solution 


is [xe y)/ =([1 1)? (cost) +[1 —1]"(—4F cos 3¢ — $sin3t) (as shown in 
Figure 4.4). 


| 
i 
LA 

a 

l 

| 

L 
- 
iT) 
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Figure 4.3 Figure 4.4 


*Exercise 4.3 


An object moves in a plane so that its coordinates at time ¢ satisfy the 
equations 


Find two directions in which the object can describe simple harmonic motion 
along a straight line. Give the angular frequencies of such motions. 


End-of-section Exercise 


*Exercise 4.4 
Find the general solution of the system of equations 


%=a+ Ay, 
y=rr y. 
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5 Solving systems on the computer 


In this section you will see how systems of linear constant-coefficient differ- 
ential equations can be solved using the computer algebra package for the 
course. Note that these activities simply verify solutions found in the text, 
so you should not spend too much time on them. 


Use your computer to complete the following activities. ) PC 


Activity 5.1 


(a) Solve the initial-value problem 


l= [rao]. [a0] = [4], 


Compare your answer with that obtained in Example 2.1. 


(b) Find the general solution of the system 


fil=[t a] [a] + [osha]: 


Compare your answer with that obtained in Example 3.1. 


Activity 5.2 

(a) Find the general solution of the system 
x 3.2 2 x 
y}]=|]2 2 0 y 
z 2 0 4 z 


Compare your answer with that obtained in Example 2.3. 


(b) Solve the initial-value problem This system of differential 
equations appeared in 


x 3.2 2 - é «(0) 0 Equations (2.2), but no 
y}=|}2 2 0 y| + | 2e*] , y(0) | = | 0 solution was obtained in the 
z 20 4 z 0 z(0) 0 text. 
Activity 5.3 
Find the general solution of the system As you saw in Example 2.4, 
. this coefficient matrix has a 
= 5 0 3 a repeated eigenvalue but it is 
yj=} 38 2 38] ly]. possible to find three linearly 
z —-6 0 -4 z independent eigenvectors. 
Compare your answer with that obtained in Example 2.4. 
Activity 5.4 
Solve the initial-value problem As you saw in Exercise 2.5, 
: this coefficient matrix has a 
ll 2 3) /a2 (0) _ 4 ; repeated eigenvalue and only 
y 0 2 y(0) 3 one linearly independent 
Compare your answer with that obtained in Exercise 2.5. ele evcator cee bo fount 
Activity 5.5 
Solve the initial-value problem As you saw in Example 2.6, 
; this coefficient matrix has 
a 3 —l| ja «(0) _|3 complex eigenvalues and 
y 2 I1/ly y(0) 1 eigenvectors. 


Compare your answer with that obtained in Example 2.6. 
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Outcomes 


After studying this unit you should be able to: 
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understand and use the terminology associated with systems of linear 
constant-coefficient differential equations; 

obtain the general solution of a homogeneous system of two or three first- 
order differential equations, by applying knowledge of the eigenvalues 
and eigenvectors of the coefficient matrix; 

obtain a particular solution of an inhomogeneous system of two first- 
order differential equations in certain simple cases, by using a trial solu- 
tion; 

understand the role of the principle of superposition in determining 
a particular integral of an inhomogeneous linear system of differential 
equations; 

obtain the general solution of an inhomogeneous system of two or three 
first-order differential equations, by combining its complementary func- 
tion and a particular integral; 

apply given initial conditions to obtain the solution of an initial-value 
problem which features a system of two first-order differential equations; 
obtain the general solution of a homogeneous system of two or three 
second-order equations, by applying knowledge of the eigenvalues and 
eigenvectors of the coefficient matrix; 

understand how systems of linear constant-coefficient differential equa- 
tions arise in mathematical models of the real world. 


Solutions to the exercises 


Section 2 
2.1 . 
@[;]>| 


: | ki | 4 | i inhomogeneous. 
w lel 

5 

1 

1 


| 


0 
1 y |; homogeneous. 
2 


1 
—2 
"| ; 
t ; inhomogeneous. 


(c) 


2.2 The matrix of coefficients is A = k | ; 


2 5 
We are given that the eigenvectors of A are [1 1]? with 
corresponding eigenvalue \ = 7, and [1 —1]? with cor- 


responding eigenvalue \ = 3. The general solution is 
therefore 


iTeolJese[ te 


Since « = 4 and y = 0 when t = 0, we have 


4=a+8, 
0O=a-Q. 


Thus a = 2 and @ = 2, so 


il=esL le 


2.3 The matrix of coefficients is A = 


Ne & 


We are given that the eigenvectors of A are [2 1 1], 
(0 1 1)? and [0 1 —1]", corresponding to the 
eigenvalues 5, 3 and 1, respectively. 


The general solution is therefore 


x 2 0 0 

y|=al1le*+el1)e*+y7} 1] e. 

z 1 1 -1 
Since « = 4, y= 6 and z =0 when t = 0, we have 

4 = 2a, 

6= at 6+, 

0= a+f-y7. 


Thus a = 2, @=1 and y= 3, so 


Solutions to the exercises 


00 1 
2.4 The matrix of coefficients is A= |0 1 0 

1 0 0 
We are given that the eigenvectors are [1 0 —1]7 and 
[k 1 kj", corresponding to the eigenvalues —1 and 1 


(repeated), respectively. But 
[kK l k7 =k1 O 17470 1 Of, 
so the general solution is 


x 1 1 0 
y|=a|] Ole *+@/O0} ety] 1] e. 
Zz —1 1 0 


Since x = 7, y=5 and z = 1 when t = 0, we have 


7=a+6, 5=y7 and 1=-a+f. 
Thus a = 3, 6=4 and y= 5, so 
x 1 1 0 
y| =3| Ole %+4]0/e4+5]1] e 
z —1 1 0 
3 4 
= O|e*+/|5) 2 
—3 4 


2.5 The matrix of coefficients is A = F | , and the 
eigenvalues are \ = 2 (repeated). 


The eigenvector equations are 
(2—A)a+3y=0 and (2—A)y=0, 
which reduce, when A = 2, to 
3y=0 and 0=0, 
so y =0. It follows that an eigenvector corresponding 
to \=2is [1 0]", and that one solution is [1 0] e?*. 


To find the general solution, we solve the equation 
(A — AI)b = v, where A— AT = A- 21 = k op¥= 
[1 0]7 and b= [bj  be]7. This gives 

0 3] }b,] — }1 

0 O} |b} JO}? 
which reduces to 3b) = 1 with no condition on 6;, so 
by = 7 and we can take 6b; = 0, giving a solution 


(elt) [fe 


Thus the general solution is 


eee 


Since « = 4 and y = 3 when t = 0, we have 
4=a and 3=38, 
giving a= 4 and @ =9. Thus the required solution is 


xv <> 1 2t t Qt 9t+4 2 
ae esl ae 
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2 1 


Using the given eigenvalues and eigenvectors, we obtain 
the general solution 


Gece be alee 


Since x = y = 1 when t = 0, we have 


1=8a+6 and 1=2a-{, 


giving a = 2 and @= -}. 


5 5 
therefore 


iT 


2.6 (a) The matrix of coefficients is A = E ‘| ; 


The required solution is 


4 1 


Using the given eigenvalues and eigenvectors, we obtain 
the general solution 


lege 


Now 


1 o(—L+2i)t e~*(cos 2¢ + isin 2¢) 
-1l-i (—1 — i)e~* (cos 2¢ + isin 2t) 


_ | e~* cos 2t + ie“ sin 2t 
~ | e~*(sin 2t — cos 2t) — ie~*(sin 2t + cos 2) | * 
So we have 


iL (—142i)¢ ) _ cos2t} + 
Re([_t_;|¢ ~ | sin 2t—cos2t}° ° 


1 (-1+2i¢) _ | sin2t¢ sf 
im([_t_;|¢ ~ | —sin2t—cos2t}° ” 


and the general real-valued solution can be written as 


(b) The matrix of coefficients is A = Ee | . 


(—142%)t 1 
; . 7) E +i 


Tl_o cos2t| _4 
y| sin 2t — cos 2t 
sin 2t —t 
ae ; 


Since x = y = 1 when t = 0, we have 
1=C and 1=-C-D, 


so C = 1, D = —2, and the required particular solution 
is 

x) — |cos2t—2sin2t| _, 

y| | 3sin2t¢+ cos 2t ie 
2.7 


(a) |y| =a|-1|e+2 


R 


2 
x 2 
(b) Jy} =a]-1] +a] 1] r+ 
0 


XR 
w 


136 


| e(—1-28)t, 


2.8 Using the given eigenvalues and eigenvectors in 
Procedure 2.3, we have 


1 
wore 0 lat -Sit 
1 3 
=a a 
1 
=e2*| 0 (cos (22) + isin (-2¢)) 
-3+¥3i 


real part 
sin (2) 
44 2" | 6 
v3 cos (2) + sin (2) 


2 
imaginary part 


Thus the general real-valued solution is 


x dl 
y| = Cl 1 et 
z 0 
cos (2) 
+ Cre2* 0 


where Cy, C2 and C3 are real constants. 


Putting « = y= 1 and z = 2 when t = 0, we have 


1 1 1 0 
1)/=C,|/1)+C@.| 0} 4+0¢C3) 9], 
2} fo) La} Ls 


so C + Cp =1, Cy = 1 and —3Cy + Y8C = 2, which 
give Cp = 0 and C3 = -&. Thus the required solution 


is 
sin ( ¥3 t) 


al 


xz 1 

= t 4 
y|= lj e' + Te 
Zz 0 


1 
at 


v3 cos (22) + sin (2) 


Section 3 


(lets 


3.2 From Example 2.2, the complementary function is 


ial =a A et +B 7 | eo. 
For a particular integral, we try 

H _ i + | 

y ct+d]|’ 
where a, b, c, d are constants to be determined. 
Substituting « = at+ b, y= ct +d into the differential 


equations gives 
i (at + b) + 4(ct +d) —t+2, 


4 
t+<3 
oF — i 


c= (at + b) — 2(ct + d) + 5t, 
which become 
(a+4c—1)t+ (b+ 4d+2- a) =0, 
te 2c+ 5)t+ (b—2d—c)=0. 


Equating the coefficients of t to zero gives 
{ a+4c-—1=0, 
a—2c+5=0, 
which have the solution a = —3, c= 1. 


Equating the constant terms to zero, and putting 
a= —3, c= 1, gives 
b+4d+5=0, 
b—2d—1=0, 
which have the solution b = —1, d= —1. 
Thus the required particular integral is 
Zp | |—st—1 
ei t-1]’ 
and the general solution is 
fe) 
¥ Yc Yp 
=a[i le +0|_ile + #—1|° 


3.3 The complementary function is 


eleefeol ilo 


For a particular integral, we try 


xL a) _ 
-[s)e* 
where a and b are constants to be determined. 
Substituting «= ae—*, y = be~* into the differential 
equations gives 
oe '— ae? + 4be~* + 4et, 
—be—* = ae" — 2be~* + Set, 


which, on dividing by e~* and rearranging, become 


2a + 4b = —4, 
a-— b=-5. 
These equations have the solution a = —4, b= 1. 


Thus the required particular integral is 


lL ale 
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and the general solution is 
[e+ ie] 
y Ye Yp 
— 4| ot 1 —3t —A —t 
=a[t]e +a|_ile + ile: 


3.4 The complementary function is 


al lead ia 


For a particular integral, we try 


x} | asin 2¢ + bcos 2t 
y| | csin2t + dcos 2t | ’ 
where a, b, c, d are constants to be determined. 
Substituting 
x =asin2t+bcos2t, y=csin2t+ dcos 2t 
into the differential equations gives, on rearranging, 
(a+ 4c + 2b — 4) sin 2t + (b+ 4d — 2a — 1) cos 2t = 0, 
(a — 2c + 2d + 1) sin 2t + (b — 2d — 2c) cos 2t = 0. 
Equating the coefficients of sin 2t and cos 2t to zero gives 
the equations 


a+ 2b+ 4c = 4, 
—2a+ 6b +4d= 1, 
a —2c+2d=-1, 


b—2c—2d= 0, 
which have the solution a = 0, b= 1, c s, d=0. 
Thus the required particular integral is 


Lp} _ | cos2t 
Y | | ssin2t}’ 


and the general solution is 


fo) [+ [2] 


3.5 The complementary function is 


i ~ | et+a/)| ef, 


We try a particular integral H = | 
2ae”" = 2ae** + 3be?! + €?!, 
Qbe?* = 2ae*" + be?’ + 4e?!, 


which give 3)+1=0 and b—2a=4, so b= —4 and 
a= —#. The general solution is therefore 


e)=eLaJet+e]a)et-a [a] 
Putting t = 0, we obtain 

2=a+38-3, $= -a+26- 5, 
so a+ 38 =3 and —a + 28 = 1, which give a = 2 and 
p= 8. 


The required solution is therefore 


stiles a [a] @ 


137 


Unit 11 Systems of differential equations 


3.6 These equations are of the form x = Ax + ke**, 
where x = Cl A= E ar k= EI and A; = —1, 
A2 = 4 are the eigenvalues corresponding to eigenvec- 
tors v; = [1 —1]7 and v2 =[3 2)", respectively. 
Using Procedure 3.2, k = pv; + qv2 for some numbers p 
and gq, ie. [1 2]? =p[l -—1]7+4q[3 2]7, which gives 
p= —¢ and q = 3, A particular integral is of the form 
Xp = (avi + bvet)e*’, where a = p/(Ag — Ay) and b = q. 
Substituting our values for p and q into these expres- 
sions gives a = —2/5 = —= and b = 2, hence a partic- 
ular integral is 


(aL jab) 


The required general solution is therefore 
_ 1 = i 3 At a. 45t —_ 4 At 
x=al_i]e +6[>]e + $5 3004.41° - 


3.7 We can choose hy = [t 0)” and he = [0 
then use the principle of superposition. 


sin], 


Choosing a particular integral 

[e yl? =[at+b ct+d]? 
with h; = [t 0], and substituting into the original 
system, we obtain 


a = 2(at + 6) +3(ct+ d) +t, 
om eRe 
sO 
2a +3c =-1, 
2a + c¢ = 0, 
—a+2b +3d= 0, 


2b- c+ d= 0, 
which have the solution a = t b= a, c= T d= 3. 
Hence ;;[4t-—7 —8t+6]” is a particular integral of 
the system x = Ax+ hy. 
Choosing a particular integral 
[xy]? =[asint+bcost csint+dcost]? 

with hy = [0 sint]’, and substituting into the original 
system, we obtain 

acost — bsint = 2(asint + bcost) 
+ 3(csint + dcost), 
ccost — dsint = 2(asint + bcost) 
+ (csint + dcost) + sint, 


which simplify to 


(2a + b+ 3c) sint + (—a+ 26+ 3d) cost = 0, 
te c+d+1)sint+ (2b—c+d)cost =0. 
So 
2a+ b+3c = 0, 
-—a42 <43d¢= 0, 
2a + c+ d=-1, 
2b— c+ d= 0, 
which have the solution a = oe b ae c ae d 
—#- Hence [-15sint+9cost 7sint—11cos¢]” is 


a particular integral of the system x = Ax + ho. 
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Using the principle of superposition, the required gen- 
eral solution is 


ileol te 


i eee 1 gee 
a 


—8t+6 34 7sint — 1lcost 


Section 4 


5 2 

2 5)? 
are given that the eigenvectors are [1 1]? correspond- 
ing to the eigenvalue \ = 7, and{1 —1]7 corresponding 
to the eigenvalue \ = 3. 


4.1 The matrix of coefficients is A = and we 


It follows that the general solution is 
fp] 6 [rete [et 


+ C3 3 | ev 3t +C4 3 | e V3, 


4.2 Using the given eigenvalues and eigenvectors, we 
obtain the general solution 


x 1 1 
y| =C, 10 ev 0 | ev 
Zz 0 0 
1 1 : 
+ C3 | —5 | cos(V3t) + Cy | —5 | sin(V3¢) 
0 0 
—5 —5 
+ Cs 4| e! + C6 4| e~ 
14 14 
25 6 
4.3 The matrix of coefficients is A = i ‘ | 
ae ia 


The characteristic equation of A is A? +5\+4=0, so 
the eigenvalues are 1 = —4 and A= —1. 


The eigenvector equations are 


(8 —d)0+ Sy =0, 
3a + (—¥2 — dy =0. 


A =-4] The eigenvector equations become 
ze+ Sy=0, 
et ry =0, 


which reduce to the equation —2y = x, so a correspond- 
ing eigenvector is [-2  1]”. 


=-1]| The eigenvector equations become 


which reduce to the equation y = 3x, so a corresponding 
eigenvector is [1 3]?. 


The general solution is therefore 


H = | (C1 cos 2t + C2 sin 2t) 


+ B (C3 cost + Cy sin t). 


The object describes simple harmonic motion along the 
eigenvectors, so there are two possibilities for simple 
harmonic motion along a straight line: 

e motion in the direction of the vector [-2  1]" (that 
is, the line « + 2y = 0) with angular frequency 2 
(corresponding to C3 = Cy = 0); 

e motion in the direction of the vector [1 3]? (that 
is, the line y = 3x) with angular frequency 1 (cor- 
responding to Cy; = Cz = 0). 


1 1 


The characteristic equation is \7 — 2\ — 3 = 0, so the 
eigenvalues are 1 = 3 and A= —1. Solving the eigen- 
vector equations, we obtain eigenvectors [2 1]? and 
[-2 1]", respectively. Using Procedure 4.1, we then 
have 


H = H (Cre + Cre-¥*") 


4.4 The matrix of coefficients is A = Ei | : 


+ ee (C3 cost + Cysin t). 


Solutions to the exercises 
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Study guide for Unit 12 


This unit extends the calculus of functions of one variable to functions of 
several variables. It would be helpful if you can recall the Second Derivative 
Test (the use of the second derivative to classify maxima and minima) and 
Taylor polynomials for functions of one variable (although these topics are 
introduced from scratch). 


We shall also discuss the application of functions of several variables to 
mechanics, and you should make sure that you can recall the concept of po- 
tential energy from Unit 8 (in particular, the potential energy of a stretched 
or compressed spring). 


Sections 1 and 2 contain material that will be needed in later units of the 
course. 


One of the methods that we introduce is based on finding the eigenvalues 
of a matrix, so you may need to refer back to Unit 10 as you work through 
Section 3. 


You will use the computer algebra package for the course in Section 4. 


We recommend that you study the Introduction and Section 1 over two 
study sessions (perhaps breaking at the end of Subsection 1.2), and that 
you allocate a study session to each of Sections 2, 3 and 4. 


bet 
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Introduction 


So far in this course, most of the mathematical models have concerned phys- 
ical quantities that vary with respect to a single variable. This variable has 
usually been time. In this unit we look at models that depend on more than 
one variable. 


As a simple example, let us consider the illumination from a street lamp, 
as shown in Figure 0.1. (That is, we are considering the illumination falling 
on, say, a flat paving slab at the point P, due to a street lamp placed at 
the point Q.) There are a number of variable quantities here. To simplify 
things a little, let us assume that the power rating of the light bulb is fixed 
at 1000 watts. Nevertheless, the height h of the bulb above the street, the 
distance x of P from the base of the lamp, the distance r from P to Q, and 
the angle @ (as shown) are all features that could be taken into consideration. 


It is possible to express h and x in terms of r and 6, or to express r and 0 
in terms of h and x. The quantities h and x are easy to measure, but the 
physics of the situation gives the illumination at P more readily in terms of 
the two variables r and 6, via the equation 


oe 1000 cos Z 


re 


(0.1) 


where I is the ‘brightness’ of the illumination per square metre falling on 
the slab at P. 


In order to express J in terms of h and x rather than r and 6, we use the 


Vie pe 
1000h 
(h2 + x2)3/2° 


In Equation (0.1) the two independent variables are r and 6, while in Equa- 
tion (0.2) the independent variables are h and x. Thus the initial choice 
of variables in a problem can affect the form of the function that we need 
to consider, and may well affect the difficulty of our calculations. In this 
case, it is quite reasonable to represent J as a function of the two variables 
h and x, so let us suppose that we have made that choice. 


formulas cos @ = and r? = h? + 2?, giving 


I(h,x) = (0.2) 


Often we wish to gain some physical insight into a problem, but, as with 
Equation (0.2), the practical implications of a particular formula are not 
obvious. However, we can gain some understanding of this situation if we 
keep the height h fixed, at say 3 metres, and vary the distance x. The 
function I = I(h, x) now becomes a function of the single variable x, so we 
can sketch its graph, as shown in Figure 0.2. We can now see that the 
point of brightest illumination is when P is immediately below the lamp (as 
intuition might lead us to expect). 


Suppose now that we keep the point P fixed at, say, x = 10 metres, but vary 
the height of the lamp. Once again, J becomes a function of a single variable, 
in this case h, and again we can sketch a graph, as shown in Figure 0.3. This 
time the result is perhaps a little more unexpected, for our graph tells us 
that there is an optimum height for the lamp at which the illumination on 
the horizontal slab at P is greatest. (This is because, as h increases, there is 
a trade-off between the increasing distance of the lamp from P, which tends 
to decrease the illumination, and the decreasing obliqueness of the angle, 
which tends to increase the illumination.) 
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Q 
x 
h 
- <> P 
x 
Figure 0.1 


The technical term for 
‘brightness’ in this sense is 
luminance. The form of its 
dependence on r is due to the 
fact that the surface area of a 
sphere of radius r is 
proportional to r?. The 
reason for its dependence on @ 
is that the light is falling 
obliquely on the slab. 


O x 


Figure 0.2 


O h 


Figure 0.3 


Section 1 Functions of two variables 


You may well wonder if it is possible to sketch a graph that represents the 
variation of J when both variables h and x are allowed to change. Indeed it 
is, but we need more than two dimensions, and the resulting surface can be_ A surface here is a set of 


hard to interpret. (We shall look at such surfaces in Section 1.) points in space where 
coordinates satisfy a 
In order to deal sensibly with functions of two or more variables, we need particular equation. 


mathematical tools similar to those that we have at our disposal for functions 
of a single variable. We should like to be able to differentiate these functions, 
to locate the points at which they take their greatest (or least) values, and 
to approximate them by polynomials. This may enable us to understand 
the behaviour of such functions and so give us an insight into the physical 
situations from which they arise. In fact, we need to develop a form of 
calculus that applies to such functions, and that is the purpose of this unit. 


Section 1 concentrates on functions of two variables. We show how such 
functions may be used to define a surface, and we then extend the notion 
of derivative in order to investigate the slope of such a surface. Section 2 
provides a brief discussion of Taylor polynomials for functions of one vari- 
able and then extends the discussion to Taylor polynomials for functions of 
several variables. Section 3 discusses the main topic of the unit: the clas- 
sification of stationary points (points where the first derivatives are zero) 
for functions of two or more variables. Section 4 broadens the approach to 
include applications to mechanics. 


1 Functions of two variables 


Our main objective in this section is to extend some of the ideas of the 
calculus of functions of one variable: most importantly, the concepts of 
derivative, stationary point, local maximum and local minimum. However, 
before we discuss the calculus of functions of two variables, we need to 
discuss the concept of a function of two variables. We introduce the concept 
with a physical situation. 


1.1 Introducing functions of two variables ——$——a 


Imagine an experiment in which a thin flat metal disc of radius 2 metres is 
heated (see Figure 1.1(a)). At a particular moment, we record the temper- 
atures at various points on its upper surface. 


We could specify the points on the surface of the disc by means of a Carte- (a) 
sian coordinate system, using the centre of the disc as the origin, as shown 

in Figure 1.1(b). The temperature may well vary over the surface of the 

disc, so that the temperature at O is higher than at P, for example. Never- 
theless, at any given moment, each point (, y) of the disc has a well-defined 
temperature (O, say, in degrees Celsius). We could denote this temperature 

by O(a, y) to remind ourselves that the value of O depends on our particular 
choice of « and y. We say that O is a function of the two variables x and y, 

and that the dependent variable O is a function of the two independent 
variables x and y. 


yA 


Pe 


There is a natural restriction on the independent variables x and y arising 

from the physical situation: O(a, y) has been defined only for points (x,y) — () 
lying on the disc. That is, the domain of the function O is the set of points 

(x,y) in the plane such that x? + y? < 4. Figure 1.1 
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Suppose that a mathematical model of this situation predicts that 
O(a, y) = 10(10e"" +9) 4.1) (22 +y? <4). (1.1) 


Then it is a simple matter to calculate the predicted temperature at any 
point on the disc. For example, at the point (1,1.5) the predicted tempera- 
ture (in degrees Celsius) is 


@(1, 1.5) = 10(10e7 71") + 1) = 10(10e78-5 + 1) ~ 13.9. 


Exercise 1.1 

Given f(x,y) = 3x7 — 2y?, evaluate the following. 

*(a) f(2,38) *(b) (3,2) *(c) f(a,b) *(d) f(d,a) 

(e) f(2a,b) (f) fla—6,0) — (g) f(w,2) (a) Fly, 2) 


Exercise 1.2 


Figure 1.2 shows a double pendulum consisting of two light model rods OA 
and AB with a smooth joint at A. The rods move in a vertical plane, with 
O attached to a fixed point by means of a frictionless hinge. A particle of 
mass m is attached to B, and the angles @ and ¢ are as shown. 


Express the potential energy U of the system (taking a horizontal line 
through O as the datum) in terms of the independent variables 6 and ¢. 
What is the least possible value of U? Figure 1.2 


1.2 Geometric interpretation 


A function of two variables expresses a dependent variable in terms of two 
independent variables, so there are three varying quantities altogether. Thus 
a graph of such a function will require three dimensions. 


Definition 
A function of two variables is a function f whose domain is R? (or Each variable is in R, hence 
a subset of R?) and whose codomain is R. Thus, for each point (zx, y) the domain of a function of 
in the domain of f, there is a unique value z defined by ie ca is denoted 
Vy . 
z= f(x,y). 


The set of all points with coordinates (x, y, z) = (a y, f(x, y)), plotted 
in a three-dimensional Cartesian coordinate system, is the surface 
with equation z = f(z, y). 


The definition of a function of two variables generalizes in a straightforward 
way: a function of n variables is a function whose domain is R” (or a 
subset of R”) and whose codomain is R. 


The simplest of all surfaces z = f(x,y) for a function of two variables arises 
from choosing f to be the zero function. We then have the equation z = 0, 
and the surface is the plane that contains the x- and y-axes, known as the 
(x, y)-plane. 

More generally, the surface corresponding to any linear function of two 
variables z = f(x,y) = Ax + By+C (where A, B and C are constants) is 
a plane. 
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For example, the equation z = f(x,y) = in —2y+2 represents a plane 
passing through the three points (3,0, 0), (0,1,0) and (0,0, 2), and extending 
indefinitely. Part of this plane is illustrated in Figure 1.3. 


x 
Figure 1.3 


The surfaces corresponding to the functions p(x,y) = 2? + y? (see Fig- 
ure 1.4(a)) and h(x, y) = y? — x? (see Figure 1.4(b)) are not planes, but you 
will see later that their behaviour near the origin is of particular interest. 
The function p(x, y) is a paraboloid (which can be obtained by plotting 
the parabola z = 2? in the (z,z)-plane and then rotating it about the z- 
axis). The function h(z, y) is a hyperboloid (which cannot be obtained by 
rotating a curve about the z-axis or any other axis). 


(a) paraboloid (b) hyperboloid 


Figure 1.4 


Section functions 


In general, the surface representing a function may be complicated and dif- 
ficult to visualize. The computer algebra package for the course can be put 
to good use here, in that you can use it to plot a function and view it from 
various perspectives. But this may not be enough, and it is often helpful 
to consider the function obtained by fixing all the independent variables ex- 
cept one at. specific values. In the case of the function p(x, y) = 2? + y?, for 
example, we might choose to fix the value of y at 2, in which case we are 
left with the function of a single variable 


p(x, 2) = 27 +4. 


This function is known as a section function of p(x, y) = x2 + y? with y 
fixed at 2. 


In the case of the surfaces shown in Figure 1.4, it is worth looking at their 
behaviour near the origin. The section functions p(z,0) and p(0,y), and 
h(x,0) and h(0,y), are quite illuminating. They show, very clearly, proper- 
ties of the surfaces that you may have already observed. 
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The section functions p(z,0) and p(0,y) are shown in Figures 1.5 and 1.6, 
respectively. 


ZA 


x 

z= p(x, 0) 

part of surface z = p(x, y) = x + y 
Figure 1.5 
ZA 
z 

> 
y y 

Xx z=p(0, y) 


part of surface z= p(x, y) =x are y 
Figure 1.6 
The surface z = p(z,y) = x? + y? has a local minimum at the origin; cor- 


respondingly, each of the section functions p(xz,0) and p(0,y) has a local 
minimum there. 


The section functions h(z,0) and h(0,y) are shown in Figures 1.7 and 1.8, 
respectively. 


z= h(x, 0) 


part of surface z = h(x, y) = y*—x° 


Figure 1.7 


<Y 


z=h(0, y) 


part of surface z = h(x, y) = y 7—x? 


Figure 1.8 
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The surface z = h(a, y) = y? — 2” has neither a local maximum nor a local 
minimum at the origin, corresponding to the fact that the section function 
h(a,0) has a local maximum at the origin, while the section function h(0, y) 
has a local minimum there. 


Of course, we have not yet defined what we mean by a (local) maximum or 
minimum of a function of two variables (that comes in Section 3), but it is 
already clear that a surface can behave in a more complicated fashion than 
the graph of a function of one variable. 


It is worth spending a little time at this point to recall how to use calculus to 
justify that the section functions h(x,0) and h(0,y) have a local maximum 
and a local minimum, respectively, at the origin. 


A standard method in this context is the Second Derivative Test, which 
applies to ‘sufficiently smooth’ functions of one variable. (In this context, 
‘sufficiently smooth’ means that the first and second derivatives exist.) If 
f(a) is a function whose first and second derivatives exist, and at some point 
x =a we have f(a) = 0, then a is a stationary point of f(a). Often (but 
not always) it will be a local maximum or a local minimum. The Second 
Derivative Test is based on the evaluation of f’(x) at a. There are three 
possibilities. 


(a) If f(a) is negative, then f has a local maximum at a. 


ay 


(b) If f”(a) is positive, then f has a local minimum at a. 


(c) If f’(a) = 0, then the test is inconclusive. There may still be a local 
maximum or a local minimum at a, but another possibility is that there 
is a point of inflection, such as occurs at x = 0 for the function f(x) = x? 
(see Figure 1.9). Figure 1.9 


Example 1.1 


Use the Second Derivative Test to verify that h(xz,0) has a local max- 
imum at «= 0 and that h(0,y) has a local minimum at y = 0, where 
ie) =o =e 


Solution 
d 
We have h(x,0) = —x?. So ao) = —2z, and this is zero when x = 0. 
at 
da 
Thus h(z,0) has a stationary point at z = 0. Now a (-®”) = —2, which 
rc 


is negative for all values of x, so this stationary point is a local maximum. 


d 
We have h(0,y) = y?. So a) = 2y, and this is zero when y = 0. Thus 
y 2 


d 
h(0,y) has a stationary point at y = 0. Now qa) = 2, which is positive 
Y 


for all values of y, so this stationary point is a local minimum. MH 


Exercise 1.3 


Show that the section function of F(x, y) = 100e~ (#°+9") with y fixed at 0 
has a local maximum at x = 0. 


The concept of a section function can be extended easily to functions of more 


a The domain of w should be 


3 y+ 2t restricted so that y + 2 can 
with a fixed at 3 and ¢ fixed at 1 is w(3,y,1) = Pr which is a function never be zero. 
y 


than two variables. For example, the section function of w(z, y,t) = 


of y only. 
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(Section functions are always functions of one variable, so we have to fix 
the values of all the variables except one to obtain a section function.) One 
obvious advantage of considering section functions (rather than the original 
function of two or more variables) is that we can apply the familiar calculus 
techniques for functions of one variable. 


1.3 First-order partial derivatives 


As a first step towards our goal of extending the ideas of calculus to functions 
of two (or more) variables, we begin by recalling the role played by the 
tangent to a graph in the calculus of functions of one variable. If we imagine 
a tangent line sliding along a graph, then each time the line is horizontal, 
we have a stationary point. We apply the same idea to functions of two 
variables, only this time we slide a tangent plane over the surface. Let us 
be a little more precise about this. 


In most cases, the tangent line to a curve C at a point P on C is the straight 
line that touches C' at P, but does not cross the curve at that point. 


Sometimes, however, even if C is a smooth curve as it goes through P, it 
is not possible to find a line with the ‘non-crossing’ property. Suppose, for 
example, that C is the graph of y = x° (see Figure 1.9) and P is the origin. 
Then every straight line through the origin ‘crosses’ C’, as you may verify by 
placing a ruler at various angles through P on Figure 1.9. Nevertheless, the 
line that is the x-axis seems to be a good candidate for the ‘tangent line’ to 
C at (0,0); intuitively, it seems to pass through the curve at a ‘zero angle’. 
Let us try to make this idea more mathematically robust. 


Consider again a general curve C and a point P on C at which we wish to 
find a tangent line (see Figure 1.10). If Q is a point on the curve close to P, 
then the chord through P and Q is the straight line through these points. If 
C is smooth enough to have a derivative at P then, as Q approaches P, the 
chord through P and Q approaches a well-defined line through P, which is 
defined to be the tangent line to C at P. 


ya 


Cc 


successive 


chords PQ 


successive 
positions 


of Q 


tangent line 


P to Cat P 


ay 


O 
Figure 1.10 


The tangent line at P is the line that has the same slope as C at that point. 
To return to the case of the curve y = f(x) = x° (see Figure 1.9), since 
f’(0) = 0, the slope of C at (0,0) is 0, so the tangent line to C’ at (0,0) is 
the x-axis, that is, the line y = 0. 


More generally, if the curve C is defined by the function y = f(x), then at 
the point P = (a, f(a)) = (a,b), the slope of C is f’(a) and the equation 
of the tangent line is y — b = f’(a)(a — a), this being the equation of a line 
through (a,b) with slope f‘(a). 


In a similar way, a smooth surface without breaks or folds has a tangent 
plane. In the cases of the paraboloid and the hyperboloid in Figure 1.4, the 
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surface is horizontal at the point (0,0,0), so, for each of these surfaces, the 
tangent plane at (0,0,0) is the (x, y)-plane. In the case of the paraboloid, 
the (x, y)-plane touches the surface at that point, but does not cut through 
it. The hyperboloid, however, lies partly above and partly below the (z, y)- 
plane, so (as with the tangent line at (0,0) to y = 2°) the (x, y)-plane cuts 
the hyperboloid — despite having the same ‘slope in any direction’ as the 
hyperboloid at (0,0, 0). 


The tangent plane can be defined by a ‘limiting’ construction similar to that 
for the tangent line. We need three points to define a plane, so if P is a 
point on a smooth surface S, we must take two further points Q and R on 
the surface, and we must ensure that P, Q and RA never lie on a straight 
line (when projected onto the (z,y)-plane). Then, as Q and R separately 
approach P (from distinct directions), the plane through P, Q and R will 
approach a well-defined plane, the tangent plane at P (see Figure 1.11). 


Just as the slope of a curve C' at a point P on C is equal to the slope of 
the tangent line at P, so the ‘slope’ of the surface S at the point P on S is 
equal to the ‘slope’ of the tangent plane at P. But what is this slope? 


Imagine that the surface is a hillside and that you are walking across it (see 
Figure 1.12). Let us suppose that you want to measure the ‘slope’ of the hill 
at some particular point. You immediately encounter a problem: in which 
direction should you measure the slope? If you choose a direction pointing 
‘straight up the hill’ you will get one value, and if you choose to move ‘round 
the hill’ you will get another. We shall choose two specific directions in which 
to measure the slope: the x-direction and the y-direction. On a smooth hill, 
this will be sufficient to determine the slope in every direction (as you will 
see in Subsection 1.4). So we start by examining the rate at which a function 
of two variables changes when we keep one of the variables fixed. 


zA 
surface z= f(x, y) 


(a, y, Z) 


(a,b, fia bem 


\ 
\ 
<y 


(a, b, 0) (a, y, 0) 


Figure 1.12 


Suppose that you are walking over the surface defined by the equation 
z= f(x,y) and that you are at the point (a,b, f(a,b)) = (a,b,c). If c>0, 
then directly below you is the point (a, b,0) lying in the (2, y)-plane (see Fig- 
ure 1.12). Now you begin to move across the surface, taking care to ensure 
that the value of x stays fixed at a. As you move, the value of y changes, 
and your height z above the (x, y)-plane varies. We are going to investigate 
the rate of change of the height z with y, which you would recognize as the 
slope (at some particular point) of the path along which you are walking. 
We refer to this slope as the slope of the surface in the y-direction (at the 
point in question). 


There are some technicalit 
concerning the choice of Q 
and R, but they are not 

important in this context. 


ies 


Figure 1.11 
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As a specific example, consider the function 
fag) je 9? ley", 


Suppose that we are trying to find the slope of this surface in the y-direction 
at the point (2,1,7) on the surface. First we construct the section function 
of f(x,y) with x fixed at 2: 


f(2,y) =4-y? +4y’. 


Then we differentiate this function with respect to y, to obtain 


Finally, we put y = 1 into this expression and obtain the value 5 for the 
slope of the surface in the y-direction at (2,1, 7). 


This process would be the same for any fixed value of x and so can be 
generalized: fix « at some constant value, then differentiate f(x,y) with 
respect to y, using all the standard rules of differentiation and treating x 
as a constant. The result is the expression —3y? + 4xry. This is the partial 
derivative of the function f(x,y) = 2? — y? + 2xy? with respect to y, denoted 
by Of /Oy. So we have 


a) 
Se eu) = —3y? + day. 
If we put « = 2 and y = 1 in this expression, then we obtain 
of 
(Ayes, 
5 (21) 


which is the same value as we found above. 


The same method can be used to find the slope in the x-direction, but this 
time we keep y fixed and differentiate with respect to 7. We obtain 
of 


Hep (Ur) = 2a + By”. 


Putting « = 2 and y = 1 into this expression, we obtain the value 6 for the 
slope of the surface in the x-direction at (2,1,7). 


More formally, the partial derivatives of a function f(x,y) with respect to 
x and y are given by 


Of e+ de,y) - f(a.) 


eo ae 1.2 

Ox  dx—0 bx (1.2) 
and 

des 1. 

Oy niet oy ; ne) 


where dx and dy denote (small) increments in the values of x and y, respec- 
tively. 


Note the important difference between the symbols: 


‘delta’ symbol 6 represents an increment; 


‘partial dee’ symbol 0 represents partial differentiation. 


We shall need the more formal definitions later, but for now the following 
definitions will suffice. 
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The use of the ‘partial dee’ 
symbol 0 (rather than d) for 
partial derivatives 
distinguishes them from 
ordinary derivatives. 


The ordinary derivative of 
f(t) with respect to t is 
formally defined as 
Pin fO+W =F) 
= lm . 
dt h—-0 h 


Section 1 Functions of two variables 


Definitions 


Given a function f of two variables (say x and y), the partial deriva- 
tive Of /Ox is obtained by differentiating f(x, y) with respect to x while The expression Of /Ox is read 
treating y as a constant. Similarly, the partial derivative Of /Oy is as ‘partial dee f by dee x’. 
obtained by differentiating f(x,y) with respect to y while treating x as 
a constant. 


The partial derivatives Of /Ox and Of /Oy represent the slopes of the 
surface z = f(x,y) at the point (x, y, f(x, y)) in the z- and y-directions, 
respectively. These partial derivatives are the first-order partial We shall define second-order 


derivatives or the first partial derivatives of the function f. and higher-order partial 
derivatives in Section 2. 


We shall also use the alternative notation f, for Of/Ox and fy for Of /Oy. 


Example 1.2 
Calculate Of /Ox and Of /Oy for 
f(z,y) = (ey+ay? («>0, y>0). 


Find the slopes of the corresponding surface z = ,/zy + xy” in the a- and 
y-directions at the point (4,1,6) on the surface. 


Solution 


First we treat y as a constant and differentiate with respect to x (remem- 
bering that /zy = \/x,/y), to obtain 


So the slope in the x-direction at (4,1,6) is 
(vi/(2va) +1 = 3. 


Now we treat x as a constant and differentiate f with respect to y, to obtain 


So the slope in the y-direction at (4, 1,6) is 


(v4/(2v1)) +(2x4x1)=9. & 


In general, Of /Ox and Of /Oy are functions of the two variables x and y, so 
in Example 1.2 we could write 


Of Vx 
Se es ae) = sey; 
Oy 2/¥ 
or, alternatively, 
VY 2 Vr 
fa(,y) oe? fy (x,y) 279 + 2xy, 
if we wish to emphasize the fact that the partial derivatives are themselves 
functions. Thus we could write 


f(A, 1) = 3, fy(4,1) =9, 


for the slopes in the z- and y-directions, respectively, at the point in question. 
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*Exercise 1.4 
Given f(x,y) = (x? + y?) sin(xy), calculate Of /Ox and Of /Oy. 


Of course, the two independent variables need not be denoted by x and y; 
any variable names will do, as the next example shows. 


Example 1.3 
Given f(a,t) = asin(at), calculate f($,1) and f;(5, 1). 


Solution 
Differentiating partially with respect to a gives fa = sin(at) + atcos(at), 
so 

fo( 5,1) = sin 5 + $ cos § = 1. 


2 


Differentiating partially with respect to t gives f; = a“ cos(at), so 


fi(3,1) = (4)? cos$ =0. Of 


*Exercise 1.5 


Given u(@,¢) = sin? + ¢tan 8, calculate ug and ug. 


Often, a mathematical model will generate a relationship between variables. 
For example, we have the formula V = amr7h for the volume V of a cone 
in terms of the radius r of its circular base and its height h. We could 
introduce a function f(r,h) = 37r?h and write the partial derivatives as 
Of /Or and Of/Oh. But it is often more convenient to let V denote both 
the variable and the function that defines it, enabling us to write OV/Or in 
place of Of /Or, and OV/Oh in place of Of /Oh, thus keeping the number of 


symbols to a minimum. 


The notion of partial derivative can be extended to functions of more than 
two variables. For example, for a function f(x,y,t) of three variables, to 
calculate Of /Ox, we keep y and t fixed, and differentiate with respect to x 
(and similarly for the other partial derivatives). 


*Exercise 1.6 
(a) Find the first partial derivatives of the function 


f(z,y,t) = vy? t* + Qay + 4t?2? + y. 


(b) Given z = (1+ x)? + (1+ y)°, calculate 0z/Ox and 0z/dy. Sketch the 
section functions z(2,0) and z(0,y). What is the relevance of the value 
of (0z/0x)(0,0) to the graph of z(xz,0), and what is the relevance of the 
value of (0z/Oy)(0,0) to the graph of z(0, y)? 


1.4 Slope in an arbitrary direction 


Suppose that the walker in Figure 1.12 is not walking in either the x-direction 
or the y-direction, but in some other direction. Can we use the partial 
derivatives (Of /Ox)(a,b) and (Of /Oy)(a,b) to find the slope at the point 
(a, b,c) in this other direction? 


To answer this question, we start by looking at the formal definitions of 
partial derivatives in Equations (1.2) and (1.3), and use them to investigate 
what happens when we move a small amount in a particular direction. 
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For a function z = f(x,y), the equations are 


af. fw+dx,y) — fey) 


eu a 5 ee) 
= : I, 
Oy ae oy (te) 


Small increments 


If we write 6z1 = f(a+06a,y) — f(x,y) and dz. = f(x, y+ dy) — f(x,y), then 

6z1 is the change, or increment, in the value of z corresponding to a (small) 

increment 6x in the value of x. Similarly, 6z2 is the increment in the value 

of z corresponding to a (small) increment dy in the value of y. So we have 
val 


Of Of 622 

—= lm —, —= lm —. 

Ox 6x0 Ox Oy  dy—0 dy 
We know from Subsection 1.3 that Of/Ox and Of/Oy are the slopes of 
z= f(x,y) in the z- and y-directions, respectively. To find the slope of 
z = f(x,y) in an arbitrary direction, we need to find the increment dz in the 
value of f when we move a short distance in that direction. Such a move- 
ment can be achieved by small increments 6x and dy, and Equations (1.4) 
relate these approximately to the corresponding increments 6z; and 622, as 
shown below. 


(1.4) 


z=f(x, y)A 


_—-(a+6x, b+6dy) 
oy 


Figure 1.13 


Figure 1.13 shows a small part of the surface z = f(x,y) and the point (a, b) 
in the (z,y)-plane. We are interested in the effect on z when x and y are 
increased by the small amounts 6x and dy, respectively. The increment 6z is 
produced by moving first from (a,b) to (a + 6x, 6), and then from (a + 6z, b) 
to (a+ 6x,b+ dy). We shall specify 621 to be the increment in z on moving 
from (a,b) to (a+ 6x, b), and 6z2 to be the increment in z on moving from 
(a+ 62, b) to (a+ dx,b+ dy). 


We consider 6z; first. This is f(a + d6x,b) — f(a,b), the difference between 
the function values at (a,b) and (a+ 62,6). Since x has moved from a to 
a+ 6x while y has remained constant, the increment in z is approximately 
equal to the increment in « multiplied by the partial derivative with respect 
to x, 1.e. 


a) 
6z1 = sa, b) dx. 


This follows from the first of Equations (1.4). 


(1.5) 


The expressions of the form 
Ox 
ot 
derivatives. 


here are quotients, not 


The approximation used here 
holds because for small 62, we 
can take the slope Of /Ox in 
the x-direction to be almost 
constant on the interval 
between x = a and 

x =a+0z. Similarly, for 
small dy, the slope Of /Oy in 
the y-direction can be taken 
to be almost constant on the 
interval between y = b and 

y = b+ oy. 
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The increment 6z2 is obtained by holding x constant at the value a+ dx 
while incrementing y from b to b+ dy. It is thus approximately equal to the 
increment in y multiplied by the partial derivative with respect to y, i.e. 


of 
6zg ~ —(a+t Oz, db) dy. 1.6 
p= Fe (a+ dx, b) by (1.6) 

At this point in the argument, it is necessary to assume that f is a suf- 
ficiently smooth function so that the methods of calculus that we require 
can be applied. We assume, in particular, that the partial derivatives are 
continuous. So, for small 6x, we have 

of of 

— 6x,b) ~ —(a,)). 17 

Fy (a+ dx, b) ~ 5" (a,b) (1.7) 
Putting Equations (1.5), (1.6) and (1.7) together, we now have our expres- 
sion linking small increments in z with small increments in x and y: 


6z = 641 + 6zg & Fa, b) da + sa b) dy, (1.8) 
which may be easier to remember when written in the form 
Oz Oz 
~ — oy. 1.9 
Oz Fu bx + Dy oy (1.9) 


The approximation (1.9) has an important application to error analysis, as 
the next example shows. 


Example 1.4 
The volume V of a cone of height h with base radius r is given by 
V= amrh. 


Determine the approximate change in the volume if the radius increases 
from 2 to 2+ dr and the height increases from 5 to 5+ 6h. 


If the radius and height measurements are each subject to an error of mag- 
nitude up to 0.01, how accurate is the estimate 


Ve 5 xmx2x5= Dir (= 20.94, to four significant figures) 


of the volume? 


Solution 


Calculating the partial derivatives of V = V(r,h) with respect to r and h, 
we have 


OV _ OV 4.4 
a 
Setting r = 2 and h = 5, we calculate 
OV OV 
Dp (25) = 2x 2x5 = Bn, Hp (2:5) = 1a x 2? = 4n, 


and thus, from approximation (1.8), 
~ 20 4 
OV = =n or + er On, 
which answers the first part of the question. 


If the maximum possible magnitudes of dr and 6h are 0.01, the maximum 
possible magnitude of dV is approximately 


(2 + 4)m x 0.01 = 8m x 0.01 ~ 0.25. 
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Thus the volume estimate of 20.94 may compare with an actual value as 
high as 21.19 or as low as 20.69. The estimate is accurate to only two 
significant figures (and should be given as V = 21, correct to two significant 
figures). Ml 


*Exercise 1.7 


Given z = (1+ 2)?+(1+ y)?, find the approximate increment 6z in z when 
x is incremented from 0 to dx and y is incremented from 2 to 2 + dy. 


(Hint: You may wish to use your solution to Exercise 1.6(b).) 


Rate of change along a curve 


Now we focus on the rate of change of z = f(x,y) if (7, y) is constrained to 
move along a curve in the (z, y)-plane. Let us suppose that x and y are them- 
selves functions of a parameter t, so that as t varies, the point (x(t), y(t)) 
moves along a curve in the (2, y)-plane, passing through (a,b) when t = to 
and through (a + 6z,6+ dy) when t = to + dt. From Equation (1.9) we have 


Oz Ozox Oz dy 
ot Ox dt Oy bt 
Having introduced the parameter t, we can think of x, y and z as func- 
tions of the single variable t. Thinking of them in this way, we have 
dx = x(to + dt) — x(to), dy = y(to + dt) — y(to) and dz = 2(to + dt) — z(to). 
Thus 
dx «(to + dt)—a(to) dy  y(to+ot)—yl(to) dz  2(to + dt) — z(to) 
ot ot "ot ot 7 ot ot 
So, from the definition of an ordinary derivative, as dt — 0, these become 
dx/dt, dy/dt and dz/dt, respectively. Hence, as dt — 0, approximation 
(1.10) becomes 


dz _ Oz dx Oz dy 


This may remind you of the Chain Rule for the derivative of z with respect 
to t if z is a function of the single variable x, and « is a function of ¢; i.e. if 
z = 2(z(t)), then 

dz dzdz 

dt dxdt’ 
Indeed, the formula for the rate of change along a parametrized curve is the 
two-dimensional analogue of the Chain Rule of ordinary differentiation. 


(1.10) 


Chain Rule 


Let z = f(x,y) be a function whose first partial derivatives exist and 
are continuous. Then the rate of change of z with respect to ¢t along a 
curve parametrized by (x(t), y(t)) is given by 
dz Ozdx Ozdy 
dt Oxdt  Oydt’ 


(1.11) 


*Exercise 1.8 


Given z = sing — 3cosy, find the rate of change of z along the curve 
(x(t), y(t)), where a(t) =, y(t) = 2t. 


You may recall from previous 
studies that a curve described 
in terms of a parameter, 

say t, in this way is called a 
parametrized curve. 


dx 
The rh are derivatives, not 
quotients. 


On the curve, z, x and y are 
functions of t only, so it is 
consistent to write, for 


d 
example, ; rather than a 
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The above form of the Chain Rule is easy to remember, but if there is a 
reason to emphasize that the partial derivatives are evaluated at a particular 
point, then it may be convenient to write it as 

dz _ Of dx Of dy 


To By (08) x ae 10) +P By x a (to) 


where x(to) = a and y(to) = 0. 


The Chain Rule is of fundamental importance and we shall refer back to it 
both in this unit and in the remainder of the course. For the moment, we 
use it to continue our discussion of the slope of a surface. 


Slope in an arbitrary direction 


We want to find the slope of z = f(x,y) at a point (a,b,c) = (a,b, f(a, b)) on 
the surface in an arbitrary given direction. This given direction is specified 
by a straight line through (a,b) in the (x, y)-plane. Suppose that on this 
line x and y are functions of the parameter t, so that x = x(t) and y = y(t). 
Let us also take x(0) = a and y(0) = b. Then our straight line can be 
parametrized as 


a(t)=a+tcosa, y(t)=b+tsina, 


where a is the anticlockwise angle the line makes with the positive «x-axis 
and t measures the distance along this line, as illustrated in Figure 1.14. As 
(x,y) moves along this line, the point (x,y, f(x, y)) moves along a curve in 
the surface z = f(x,y) (see Figure 1.15). 


Figure 1.15 


The advantage of defining the line in terms of the parameter ¢t is that the 
derivative of z with respect to t is the quantity we are looking for — the 
slope of the surface z = f(x,y) at the point (a,b,c), in the direction that 
makes an angle a with the direction of the x-axis (see Figure 1.15). 
Moreover, we can now use the Chain Rule given in Equation (1.11). Since 
dxz/dt = cosa and dy/dt = sina, we obtain 
d 7) O 
ee wosat “sina. (1.12) 


dt Ox Oy 


That is, we have shown that the slope of the surface z = f(x,y) at the 
point (a,b, f(a,b)) in the direction making an anticlockwise angle a with 
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(a+ tcos a, 
b+ tsin a) 


Figure 1.14 
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the positive x-axis is 


OT ey. b) cosa@ + hig b) sina 


Ox Oy 
or 

fr(a, b) cosa + fy(a, b) sina. 
This defines the slope of a surface in a particular direction. But what is 
the slope of the surface at (a,b, f(a,b))? The answer is that it can be 
thought of as a vector, with a component f,(a,b) in the x-direction and a 
component f,(a,b) in the y-direction, i.e. as the vector f,(a,b)i+ fy(a, b)j. 
The advantage of this formulation is that the slope in the direction of an 
arbitrary unit vector d = (cosa)i + (sina)j in the (x,y)-plane is the dot |d| = cos? +sin2a =1 
product of the vectors f,(a,b)i+ fy(a,b)j and d. This idea is important 
enough to deserve some terminology of its own. 


Definition 
The vector grad f(a,b) = f(a, b)i+ fy(a, b)j is called the gradient of 
the function f(x,y) at the point (a,b), and is alternatively denoted by 


Vf (a,b). Thus V f(x,y) is a (vector) function of two variables, called The symbol V is called ‘del’, 
the gradient function. or sometimes ‘nabla’. 


The result concerning the slope of a surface can now be written as follows. 


Slope of a surface 


Let z = f(x,y) be a surface described by a function whose first par- 
tial derivatives exist and are continuous. Then the slope of the surface 
at the point (a,b, f(a,b)) in the direction of the unit vector 
d = (cos a)i + (sina)j in the (x, y)-plane is the dot product 


(V f(a,b)) +d = fr(a,b) cosa + fy(a, 6) sina. (1.13) 


Se ercise: 19 TT _LLLLLLL 
Given the surface S defined by z = f(x,y) = 2x?y + 3zy’, find the following. 
(a) The gradient function V f(x,y) = frit fyJ. 

(b) The slope of the surface at the point (2,1,14) in the direction of the 
vector d = 3i + 4j. 

*Exercise 1.10 


By varying the angle a (measured anticlockwise from the positive z-axis), 
we can examine the slope of the surface at the fixed point (a,b,c) in any 
direction we wish. 


Calculate the greatest slope of the surface z = f(x,y) = $x" + V/3y? at the 


point (2,1,2+ V3) on the surface. Show that this greatest slope is in the 
direction of V f (2,1). 


In general, as the result from Exercise 1.10 illustrates, the direction of the 
gradient function V f(x,y) at a point (a,b) corresponds to the direction of 
greatest slope of the surface z = f(x,y) at the point (a,b, f(a,b)). To see 
this, we observe that Vf-d = lV f| |d| cos 6 = |Vf|cos6@, where @ is the 
angle between Vf and d, and 6 = 0 gives the maximum value. 
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End-of-section Exercises 


Exercise 1.11 


Given z = ysinz, find the rate of change of z along the curve (z(t), y(t)), 
where x = e' and y = t?. Evaluate this rate of change at t = 0. 


Exercise 1.12 


Given the surface defined by z = f(x,y) = (x + 2y)? — (2x — y)?, find each 
of the following: 


(a) the gradient function V f(x,y) = fri+ fyj; 


(b) the slope of the surface at the point (1,0,—3) in the direction of the 
vector i+ j. 


2 Taylor polynomials 


The aim of this unit is to extend the techniques of calculus to functions 
of two (or more) variables, in order to be able to tackle a wider range of 
problems in applied mathematics. However, before continuing, it is neces- 
sary to review Taylor polynomials and Taylor approximations as they apply 
to functions of one variable. These are revised in Subsection 2.1. Sub- 
section 2.2 introduces higher-order partial derivatives of functions of two 
variables (which are conceptually very like their counterparts for functions 
of one variable). In Subsection 2.3 we generalize Taylor polynomials and 
Taylor approximations to functions of two variables. 


2.1 Functions of one variable 


Many useful functions (e.g. trigonometric and exponential functions) cannot 
generally be evaluated exactly, so the best we can do is approximate them. 
Polynomial functions are often used as approximations because they are 
easy to evaluate and manipulate. In many cases, a good approximation 
to a function f can be obtained near some point a in the domain of f by 
finding the polynomial of a chosen degree that agrees with f at a, and also 
agrees with the first few derivatives of f evaluated at a. For example, the 
function f(a) = sinz can be approximated reasonably well near x = 0 by 
the first-order polynomial p;(x) = x (see Figure 2.1). 


So sinz ~ x is a good approximation for small x (say |x| < 0.1). This is 
because the functions y= x and y = sina at x = 0 agree in value (they 
are both 0), in their first derivatives (they are both 1), and in their second 
derivatives (they are both 0). 


yA 


y=cos x 


Figure 2.1 Figure 2.2 
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pi has the subscript 1 because 
it is a first-order polynomial. 
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Similarly, the function f(a) = cosa can be approximated quite well near 
x = 0 by the quadratic (second-order) polynomial p2(x) = 1 — 5x? (see Fig- 
ure 2.2). The reason is that po(x) and cosz agree at x = 0 in their values, 
and also in the values of their first, second and third derivatives. 


Exercise 2.1 


Verify the above statements concerning p;() and po(x). That is, check that 
pi(0) = sinO0, p}(0) = cos0, p/(0) = —sin0, p2(0) = cos0, ph(0) = —sin0, 
p5(0) = —cos0 and p’(0) = sinO. 


We call pi(x) = x the tangent approximation to the function f(x) = sinz 
near x = 0, or the Taylor Poleroviat of degree 1 for f(x) =sina about 


x=0. Similarly, po(#) = 1—- 5x" is the quadratic approximation to the In fact, p:(z) is also the 
function f(x) = cosa near x = 0, or the Taylor polynomial of degree 2 for quadratic approximation to 
f(x) =cosz about x =0. sin x, and p2(2) is also the 


cubic approximation to cos x. 


Definition 
For a function f(x) that has n continuous derivatives near x = a, the 
Taylor polynomial of degree n about x = a, or the nth-order We use the phrases ‘Taylor 
Taylor polynomial about x = a, is He igeae of ciara men 
: ‘nth-order Taylor polynomial’ 
Pn(z) = f(a) + f'(a)(a —a)+ nt’ (a) (x — a)? a at’ (a)(« _ a)? synonymously. 
+A f™(a)(- a)”. (2.1) 
If a = 0, expression (2.1) becomes the simpler expression 
pr(z) = ‘ )+ fe + Af"(O)a? + Af" (Oa? cue — 
1 18 18 KNOWN as a aclaurin 
+ af (0 ja" (2.2) series. 
Example 2.1 


For the function f(x) = e?”, calculate f(0), f’(0), f’(0) and f’”(0). Write 
down the third-order Taylor polynomial p3(a) for f(a) about x = 0. 


Solution 

Since f(a) = e?”, it follows that 
f'(z) =2e", f(a) =4e", f(x) = 8e**. 

Therefore 
fO)=1, f(O)=2, f"(O)=4, f'"(0) =8. 

The third-order Taylor polynomial for e?” about x = 4 is therefore 
p3(z) =1+2¢+ 42° + 32° =1+2r+2x7+ 4 | 


For the functions that you will meet in the remainder of this course, succes- 
sive higher-order Taylor polynomials will give successively better approxi- 
mations, at least for values of x that are reasonably close to a. This claim 
will be enhanced in Unit 26, where you will see a theorem (Taylor’s Theo- 
rem) that makes a statement about the possible size of the error involved in 
approximating a function by a Taylor polynomial. 


Exercise 2.2 


For f(x) = e?” (as in Example 2.1), write down the Taylor polynomials 
po(x), pi(w) and p(x) about x = 0. Evaluate po(0.1), p1(0.1) and po(0.1), 
and compare these values with the value of f(0.1) obtained on a calculator. 
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One important application of Taylor polynomials is in examining the local 
behaviour of a function. Suppose that the function f(t) has a stationary 
point at t =a (so f’(a) = 0). One consequence of Taylor’s Theorem is that 
close to t = a, the behaviour of f(t) will be the same as the behaviour of 
the second-order Taylor polynomial for f(t) about t = a. So we can use the 
second-order Taylor polynomial about t = a to determine the nature of this 
stationary point. Close to t = a, we have 


F(t) = pa(t) = f(a) + af"(a)(t — a)”. 


The behaviour of the function f(t) near t = a is determined by the sign of 
f”(a), assuming f”(a) 4 0 (see Figure 2.3). In fact, 


f®-f@ = af" @E- a)’; 
thus if f(a) 4 0, the right-hand side of this equation does not change sign 
near t= a. So either f(t) > f(a) near t =a (if f”(a) > 0), or f(t) < f(a) 
near t= aa (if f”(a) < 0). 


So if f”(a) > 0, f(t) has a local minimum at t = a, and if f’(a) < 0, f(¢) has 
a local maximum at t = a. However, if f”(a) = 0, the polynomial p2(t) can 
tell us nothing about the nature of the stationary point of f(t) at t= a. This 
is the result known as the Second Derivative Test that you saw in Section 1. 
To illustrate the usefulness of Taylor polynomials, let us examine the case 
when f”(a) = 0 a little further. (In this case, the Second Derivative Test is 
of no help.) 


Example 2.2 


Suppose that you are told that f’(a) = f(a) = 0, but f(a) ¥ 0, for a func- 
tion f(t). If f(t) is approximated by its third-order Taylor polynomial ps(t), 
what does p3(t) tell you about the stationary point of f(t) at t= a? 


Solution 
We have 


f(t) = pa(t) = F(a) + gf" (a)(t — a)”, 


sO 


FQ) -f@ = xf" (a(t — a)’. 


This will change sign as t passes through a (whatever the sign of f’”(a)). 
It follows that the stationary point is neither a local maximum nor a local 
minimum; such a point is a point of inflection. Hl 


You should take particular notice of the reasoning in Example 2.2, because 
we shall apply a similar process in discussing functions of two variables. 
It is the sign of f(t) — f(a) near t =a that determines the nature of the 
stationary point at t= a. 


If we are concerned with the local behaviour of f(t) near t = a, then we are 
interested in the behaviour of f(t) — f(a) when t — a is small (and certainly 
less than 1). If t—a is small, then (t — a)? is even smaller, while higher 
powers of t — a are smaller still. When, for example, we use the second-order 
Taylor polynomial as an approximation to f(t), we are effectively ignoring 
terms involving (t — a)°, (t—a)*, .... This is another idea that extends to 
functions of two variables. 
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The change in independent 
variable from x to t has no 
particular significance. 


f@4 
es >0 
| 
| 
: 
| > 
O pi t 
f@ 
f'@ <0 
| 
| 
| 
| > 
O| y t 
Figure 2.3 


When a Taylor polynomial is 
used to approximate a 
function, we refer to the 
Taylor polynomial as a Taylor 
approximation to the 
function. 


2.2 Higher-order partial derivatives 


In the next subsection we shall extend the concept of Taylor polynomials to 
functions of two variables, but first we must develop the concept of partial 
derivatives a little further. 


You saw in Section 1 that we can differentiate a function of two variables 
f(x,y) partially with respect to x and partially with respect to y, to obtain 
the partial derivatives Of /Ox and Of /Oy. For example, if f(x,y) = sin(xy), 
then Of /Ox = ycos(xy) and Of /Oy = xcos(xy). The partial derivatives are 
themselves functions of two variables, so it is possible to calculate their 
partial derivatives. In the case of the function f(x,y) = sin(#y), we may 
differentiate Of /Ox partially with respect to 2 and obtain 


- (54) - 2 (ycos(xy)) = -y° sin(xy). 


Each of Of /Ox and Of /Oy can be partially differentiated with respect to 
either variable, so, for this particular function f(x,y), we have, in addition 
to Equation (2.3), 


(2.3) 


(5 f\_s = 5 (veos(ey)) = cos(ay) — xy sin(xy), 
(3p) ~ a 
55 (35) = ay 


Definition 


xcos(xy)) = cos(xy) — xysin(xy), 


(x cos(ay)) = —a? sin(xy). 


The second-order partial derivatives (or second partial deriva- 
tives) of a function f(x,y) are 


af a faf\ Ff a (af 
0x2 Ox (55) 7 Oy2 Oy (55) 
ef  afaf\ ef a (af 
OxOy Ox (54). OyOx Oy (54). 


They are often abbreviated as fr, fyy, fry and fyx, respectively. 


We can extend the ideas here to obtain higher-order partial derivatives of 
any order. For example, we can obtain third-order partial derivatives by 
partially differentiating the second-order partial derivatives. 


Example 2.3 
Determine the second-order partial derivatives of the function 


f(x,y) =e* cosyta?—yH1. 


Solution 
We have 
oF = cosy + 2n, Se = ~esiny = 1, 
so 92 92 
a =e" cosy + 2, a e” sin y, 
ot * si oi e” cos iy 
= —e” sin — 
OyOx o Oy? a 


Section 2 Taylor polynomials 


Both notations are in 
common use, and we shall use 
them interchangeably. 
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*Exercise 2.3 


Given f(x,y) = zsin(zy), calculate for, fyy, fey and fyz. 


In both Example 2.3 and Exercise 2.3 (as well as in the work on 
f(x,y) = sin(xy) at the beginning of this subsection), you can see that 

o? O? 

f = sca : (2.4) 

OxOy OyOx 
that is, fry = fyx. This is no accident; this result is always true, provided 
that the function f(x,y) is sufficiently smooth for the second-order partial 
derivatives to exist and to be continuous. 


Theorem 2.1 Mixed Derivative Theorem 


For any function f(x,y) that is sufficiently smooth for the second-order 
partial derivatives to exist and to be continuous, 
This can be written as 


af _ oi fey = Fee 


OxOy OyOx 


We assume throughout the remainder of this unit that the functions we deal 
with are smooth enough for the Mixed Derivative Theorem to apply. 


*Exercise 2.4 


Given f(a, y) = er, calculate fx(0, 0), fy (0, 0), fea(0, 0), Fyy(0, 0), fry (0, 0) 
and fyz«(0, 0). 


The ideas in this subsection can be extended to functions of more than two 
variables, though we do not do so here. 


2.3 Functions of two variables 


In the case of a function f of one variable, say x, you saw in Subsection 2.1 
that the nth-order Taylor polynomial agrees with f in value and in the 
values of the first n derivatives at the chosen point « = a. This property of 
agreement in function value and values of the derivatives is crucial in the 
definition of Taylor polynomials, and is the property that we generalize to 
more than one variable. 


Consider, for example, the function 
f(a,y) =e t79, 


If we wish to find a first-order polynomial p(x, y) that approximates f(x, y) 
near (0,0), then it seems that we must ensure that p agrees with f in value 
and in the values of the first partial derivatives at (0,0). That is, we need 


to find a first-order polynomial A first-order (linear) 
polynomial in one variable 
P(z,y) =at Baty, (2.5) has the form f(z) =c+maz, 


where c and m are constants. 


where a, § and ¥ are constants, with the properties that iiest-ordec pokytiondal ta 


p(0,0) = f(0,0), pr(0,0) = f,(0,0), py(0, 0) = f, (0,0): two variables has linear terms 
in both variables, so is of the 
In our example, f;(z,y) = e**?4 and fy(z, y) = 2e7*Y, so form f(x,y) =c+mz+ny, 
where c, m and n are 
f(0,0) =1,  f,(0,0) = 1, f,(0, 0) = 2. constants. 
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Thus, in defining p, we need to choose the coefficients a, 3 and ¥ so that 
p(0,0)=1, p,0,0)=1, p,(0,0) =2. 
From Equation (2.5), p(0,0) = a, so we need a= 1. Also, differentiating 
Equation (2.5), we obtain 
Pxo(@,y) = 8,  py(@,y) = 7. 
In particular, 
px(0,0) = 8, py(0,0) = 4. 
So we need G= 1 and y= 2. The required Taylor polynomial is therefore 
p(@,y) =1+ax4 2y. 


More generally, whatever the function f(x,y) may be (provided that we can 
find its first partial derivatives), we can make p(x,y) = a+ Gx + yy agree 
with f(x,y) at (0,0) by setting a = f(0,0). Also, we can make the first 
partial derivatives agree at (0,0) by setting 6= f,(0,0) and y= f,(0,0). 


Definition 


For a function f(z, y) that is sufficiently smooth near (x, y) = (0,0), the 
first-order Taylor approximation (or tangent approximation) 
to f(x,y) near (0,0) is 


pila, y) = f (0,0) + fe(0, O)x + fy(0,0)y. 


*Exercise 2.5 


Given f(z, y) = e?*-”, find the tangent approximation to f(x,y) near (0,0). 


As in the case of functions of one variable, we obtain a more accurate approx- 
imation than the tangent approximation if we use a second-order polynomial 
that agrees with the function not only in the above respects, but also in the 


Taylor polynomials 


values of the second partial derivatives at (0,0). For the moment, we continue 
De to consider approximations 
A general second-order polynomial in x and y takes the form near (0,0), though we shall 
g(x,y) =a+ Brtryt Ax? + Bay + Cy?. (2.6) generalize shortly. 


In order to fit the value of q(x, y) and its first and second partial derivatives 
at (0,0) to those of a function f(z, y), it is necessary to determine the partial 
derivatives of g. You are asked to do this in Exercise 2.6. 


*Exercise 2.6 

For q(x, y) as described in Equation (2.6): 

(a) find the functions gz, dy, dex, Gey aNd yy; 

(b) evaluate q(x, y) and its first and second partial derivatives at (0,0). 


From the result of Exercise 2.6, it follows that in order to approximate 
f(x,y) near (0,0) by g(x,y) in Equation (2.6), we must set 


a = f(0,0), B= 70,0); y= fy(0,0); 
A=47,,0,0), B=Ji00),. C=47,90,0). 
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Definition 
For a function f(x,y) that is sufficiently smooth near (x,y) = (0,0), 
the second-order Taylor approximation (or quadratic approxi- 
mation) to f(x,y) near (0,0) is 

+ 4(fuax(0,0)x? + 2fry(0, 0)ay + fyy(0,0)y?). (2.7) 


If we take only the linear terms in Equation (2.7), we obtain the first-order 
Taylor polynomial. This is the equation of the tangent plane at (0,0). 


*Exercise 2.7 


Verify that z = g(x,y) = f(0,0) + fr(0,0)2+ fy(0,0)y is the equation of a 
plane through (0,0, f(0,0)). Show that the plane has the same gradient as 
the surface z = f(x,y) at (0,0, f(0,0)). 


It is also possible to obtain similar approximations for f(x,y) near an arbi- 
trary point (a,b). In this case, x is replaced by x — a, y is replaced by y — 8, 
and the function f and its partial derivatives are evaluated at (a,b). 


Definitions 


For a function f(z, y) that is sufficiently smooth near (x, y) = (a,b), the 
first-order Taylor polynomial for f(x,y) about (a,b) (or tangent 
approximation to f(z,y) near (a,b)) is 

pi(x,y) = f(a,b) + fra, b)(% — a) + fy(a,b)(y — 6). 
The tangent plane to the surface z = f(x,y) at (a,b, f(a, 6)) is given 
by z = pi(z,y). 
For a function f(x,y) that is sufficiently smooth near (x,y) = (a, )), 


the second-order Taylor polynomial for f(z,y) about (a,b) (or 
quadratic approximation to f(x,y) near (a,b)) is 


p2(a,y) = f(a, 6) + f(a, b)(@ — a) + fy(a, b)(y — b) 
+ 5(fax(a, b)(x — a)” + 2fey(a,b)(a — a)(y — b) 
+ fyy(a, b)(y — 6)”) . (2.8) 


Example 2.4 


Determine the Taylor polynomials of degrees 1 and 2 about (2,1) for the 
function 


f(a,y) = 2? + vy — 2y’. 
Solution 


Differentiating the function partially with respect to x and partially with 
respect to y gives 


fo= 30? +y, fy=a—4y. 
Differentiating partially again gives 


fox = 62, fey =1, fyy = —A. 
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It follows that 
fea i=8, 220 c=18, £0,1)=—2, 
i,2N=2. J,0 = 712] =4 
and therefore 
pi(a,y) = 8 + 18(@ — 2) — Ay — 
po(x,y) = 8 + 13(@ — 2) — 2(y — 
69) Hoy =—1)=-26-17.. 2 


’ 


1) 
1) 


*Exercise 2.8 


Use the results of Exercise 2.4 to write down the second-order Taylor poly- 
nomial for f(x,y) = e2"+3¥ about (0,0). 


We could extend the arguments of this subsection to define Taylor poly- 
nomials of order higher than two, or even to functions of more than two 
variables, but we do not do so here. 


End-of-section Exercises 


Exercise 2.9 


In Exercise 1.2 we discussed the potential energy U of the mechanical system 
shown in Figure 1.2, and you saw that U(0,¢) = —mga(cos@+ cos @¢). Find 
the second-order Taylor polynomial for U(@,¢) about (0,0). 


Exercise 2.10 


Determine the second-order Taylor polynomial about (0,0) for the function 
f(x,y) =e + (x+y)? 


Exercise 2.11 


Determine the second partial derivatives of f(a, y) = (x? + 2y? — 32y)?. 
Evaluate these partial derivatives at (1,—1). 


Exercise 2.12 


(a) Determine the Taylor polynomials of degrees 1 and 2 about (0,0) for 
the function f(x,y) = (2 +2 + 2y)?. Compare your answers to the ex- 
pression obtained by expanding (2 + x + 2y)?. 


(b) Determine the Taylor polynomials of degrees 1 and 2 about (1,—1) for 
the function f(x,y) = (2+ 2+ 2y)?. 


(c) Putting X = x-—1 and Y=y+1 (so that X =0 when x = 1, and 
Y =0 when y = —1), we have 
f(x,y) = (2404 2y)) = (24+ (X41) +2(Y-1))° 
=(1+X+2Y)*. 


Write down the Taylor polynomials of degrees 1 and 2 about (0,0) for 
the function F(X,Y) = (1+ X 4+ 2Y)?. 


Taylor polynomials 
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3 Classification of stationary points 


The main purpose of this section is to extend to functions of two variables 
the Second Derivative Test that was discussed in Section 1. If f(x,y) is 
sufficiently smooth that the first and second partial derivatives are defined, 
and if f,(a,b) = fy(a,b) =0 at some point (a,b), then there is a distinct 
possibility that f(x,y) has a local maximum or a local minimum at (a, b). 


In the case of a function of one variable, the sign of the second derivative 
at a stationary point is enough to distinguish whether that point is a local 
maximum or a local minimum (provided that the second derivative is not 
zero at that point). The situation is more complicated for functions of two 
variables (for a start, there are three second partial derivatives to consider), 
but, similarly, a knowledge of the values of these derivatives at a stationary 
point will often tell us whether it is a local maximum, a local minimum, or 
neither. 


3.1 Extrema 


In searching for the local maxima and local minima of a function of one 
variable, the first step is to locate the stationary points, i.e. the points 
where the derivative is zero. The same is true in the case of functions of two 
or more variables. 


Definition 


A stationary point of a function f(z, y) is a point (a,b) in the domain 
of f(x,y) at which f,(a,b) = fy(a,b) = 0. 


The corresponding point (a,b, f(a,b)) on the surface S defined by 
z= f(x,y) is a stationary point on S. 


Example 3.1 
Locate the stationary point(s) of the function f(z, y) =5+ (x —1)?+y?. 
Solution 


Partially differentiating gives f, = 3(a —1)?, which is zero when x = 1, and 
fy = 2y, which is zero when y = 0. So (1,0) is the only stationary point 
(corresponding to the point (1,0,5) on the surface). 


Generally, to find the stationary point(s), we need to solve a pair of simul- 
taneous equations, as the following example shows. 
Example 3.2 
Locate the stationary point(s) of the function 
f(a,y) = 2? +49? + (e@- Dt 2). 
Solution 


Partially differentiating gives f, =2x2+y+2and fy =2y+a-—1. To find 
the stationary points, we need to solve the pair of simultaneous equations 


22+ y= -2, 
g+2y= 1. 


We find that « = —3 and y = $. It follows that (—3, 3) is the only stationary 
point. 
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The definition of a stationary 
point can be extended to 
functions of three or more 
variables. 


Section 3 Classification of stationary points 


*Exercise 3.1 


Locate the stationary point(s) of the function 


f(x,y) = 3a? — day + 2y? + 4a — By. 


A word of warning! The simultaneous equations that must be solved in 
order to find stationary points are in general non-linear. For example, if 


f(v,y) =e" 1 4 xty t ay’, 


then f,; = e?+?Y + 4r3y + y® and fy = 27+? + xt + 3xy?, so we need to 
solve the pair of simultaneous equations 


et t2y + Ax? y a y? — 0, 
Qert2y 4 ot + Bary? = 0. 


We shall not ask you to tackle problems as difficult as this by hand, but the 
next exercise involves a pair of non-linear equations that can be solved by 
factorization. 


*Exercise 3.2 


Locate the stationary points of the function f(z, y) = ry(a+ y — 3). 


Ifa function f of n variables is smooth enough to have first partial derivatives 
everywhere on R”, then any local maxima or local minima that exist will 
occur at stationary points. For a function of one variable, we could apply the 
Second Derivative Test to the stationary points. We now generalize this to a 
method of classifying the stationary points of functions of several variables, 
using second partial derivatives. 


The purpose of this section is to describe two ways of doing this. The first 
is particularly useful for functions of two variables, while the second can be 
used for functions of any number of variables. But, before going any further, 
we need definitions of ‘local maximum’ and ‘local minimum’ for functions of 
more than one variable. We shall state such definitions for functions of two 
variables; it is not difficult to see how to generalize these to several variables. 


Definitions 


A function f(x,y), defined on a domain D, has a local minimum 
at (a,b) in D if, for all (2, y) in D sufficiently close to (a,b), we have 


f(z,y) = f(a,d). 
A function f(x,y), defined on a domain D, has a local maximum 
at (a,b) in D if, for all (x,y) in D sufficiently close to (a,b), we have 
f(x,y) < f(a, d). 


A point that is either a local maximum or a local minimum is an 
extremum (and vice versa). 


If the function f(x,y) has an extremum at (a,b), then the section function 
z= f(x,b) (a function of x only) must also have an extremum at x = a, 
so (Of /Ox)(a,b) = 0. Similarly, (Of/Oy)(a,b) = 0. It follows that every 
extremum of f(x,y) is a stationary point of f(x,y). However, not every 
stationary point is necessarily an extremum. 
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Definition 


A stationary point of a function of two variables that is not an ex- 
tremum is a saddle point. 


The term saddle point originates from the shape of surfaces near some such 
points for functions of two variables. An example is provided by the shape of 
the hyperboloid shown in Figure 1.4(b) near a stationary point (in this case, 
the origin) — it looks like a rider’s saddle. (A saddle drops down on either 
side of the rider, but rises up in front and behind.) For completeness, it 
is worth noting that there are more complicated possibilities for the shapes 
of surfaces near stationary points that are not extrema; all such points are 
referred to as saddle points, however. 


Exercise 3.3 


In Exercise 1.2 you found that the potential energy U of the mechanical 
system shown in Figure 1.2 can be written as U(0, 6) = —mga(cos 6 + cos @). 
Show that U(0,¢) has a stationary point at (0,0), and that this point is a 
local minimum. 


If we are looking for the extrema of a given function, then we know that 
we should look amongst the stationary points. However, not all cases are 
as straightforward as Exercise 3.3, and we shall need some general means of 
classifying them (much as we have for functions of one variable). You will 
see next that we can use the second-order Taylor polynomial to construct a 
useful test that will often distinguish between local maxima, local minima 
and saddle points. 


Let f(x,y) have a stationary point at (a,b). To ensure that f(x,y) has 
a local minimum at (a,b), it is not enough to stipulate that each of the 
section functions f(«,b) and f(a,y) through (a,b) has a local minimum at 
that point. There may still be directions through (a,b) along which the 
value of f(x,y) decreases as we move away from (a,b). 


For example, consider the function f(a, y) = 2? + 6xry + 7y?, which pos- 
sesses a stationary point at (0,0). The section functions through (0,0) are 
f(x,0) = x? and f(0,y) = 7y?, each of which has a local minimum at (0,0). 
But let us move along the parametrized curve (actually a straight line) given 
by x(t) = 2t, y(t) = —t. Then 


F(a(t), y(t) = (26)? + 6(2t)(—t) + 7(—-1)? = -#. 


As we move along this line from (0,0), the value of f(a(t), y(t)) becomes 
negative. As f(0,0) = 0, it follows that (0,0) cannot be a local minimum 


of f. 


One way to understand the reason for this behaviour is to express the func- 
tion as a difference of two squares: 


{(z,y) = e+ 6xy + Ty? = 2(a + 2y)? —(a+ y)?. 


Thus, no matter how close to the origin we look, we see some points where 
the function is positive and some where it is negative. In particular, in 
the direction along which x + 2y = 0, the expression on the right-hand side 
reduces to —(x + y)?, which is negative except when (x, y) = (0,0). However, 
in the direction x + y= 0 the expression is 2(x + 2y)?, which is positive 
except when (x,y) = (0,0). Therefore (0,0) is a saddle point of f(z, y). 
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See = 2x + 6y, so f(0,0) =0 
fy = 6x + 14y, so f,(0,0) = 0. 


Strictly, as a difference of two 
squares this is 


(V2a + 2V/2y)? — (a +y)?. 


Section 3 Classification of stationary points 


In general, f(z,y) may not be a function that can be manipulated as eas- 
ily as the polynomial above. Nevertheless, the quadratic approximation to 
f(x,y) about (a,b) is a polynomial, and you will see in the next subsection 
that its second-order terms can be manipulated in this way. Moreover, you 
will see that this is usually sufficient for us to be able to classify the station- 
ary point at (a,b). The test that we shall derive, based on the quadratic 
approximation, is similar to the Second Derivative Test for functions of one 
variable. 


Before going on to derive this test, it is worth recalling the logic behind the 
Second Derivative Test. The quadratic approximation to a function f(x) of 
one variable about x = a is given by 


f(x) = f(a) + f(a) — a) + 5f"(@)(a — a)’. 


If a is a stationary point, f’(a) =0 and so f(x) — f(a) ~ 5 f"(a)(a—a)?. 
Therefore (provided that f”(a) 4 0) the quadratic approximation will have 
a minimum or a maximum, depending on the sign of f’(a). Close enough 
to a, the approximation will behave like the function itself, thus allowing 
us to conclude that the function has a local minimum or maximum. Thus 
the Second Derivative Test uses the second-order terms in the quadratic 
approximation to classify the stationary point. 


3.2. AC — B? criterion 


In considering a stationary point of a function f(x,y), it will make the 
algebra easier if we take the stationary point to be at the origin, so that It will not be hard to 
fr (0,0) =0 and f,(0,0) = 0. It will also be useful to write generalize later. 


A= fz,(0,0), B= 50,0) (= Jye(0,0)) 5 C= fyy(0,0). 


The quadratic approximation to f(x,y) is the second-order Taylor polyno- As for functions of one 
mial. Since we are assuming that the first derivatives are zero at (0,0), variable, close to a point 


E i 2. i (a,b), the second-order Taylor 
paneer polynomial for f(z, y) about 


f(x,y) ~ f(0,0) + 5(Ax? +2Bary+ Cy’), (a,b) behaves in the same 
way as f(x,y) itself. This is a 

sO consequence of a theorem 
1 2 2 (similar to Taylor’s Theorem) 
f(z,y) — f(0,0) = g(Ax® + 2Bary + Cy’). (3.1) that we do not include in this 


Thus we shall be able to classify the stationary point at (0,0) if we can COUTSe- 
determine the sign of the term 5( Ax? + 2Bary+ Cy’). The multiplication 
by 5 is not relevant to determining the sign; thus we shall concentrate on 
the term Ax? + 2Bxry + Cy? and, in particular, on expressing it as a sum of 


two squares. We consider three possible cases, depending on whether or not 
A and/or C is zero. 


Case1 A#0 
We can express Ax? + 2Bay + Cy? as 
1 Multiply out the right-hand 
Ax? + 2Bay+ Cy’ —w ((Ax ++ By)? + (AC = B?)y") : side, and you will find that it 
A gives the left-hand side. 


Thus the sign of A and the sign of AC — B? between them tell us what kind 
of stationary point we have at the origin (provided that both are non-zero). 
If both are positive, we have a local minimum. If A is negative and AC — B? 
is positive, we have a local maximum. If AC — B? is negative, whatever the 
sign of A, the expression is a difference of two squares, so we have a saddle 
point. 
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If AC — B? = 0, we do not have enough information to classify the stationary 
point at the origin. Along the direction in which Ax + By = 0, the quadratic 
expression Ar? + 2Bry + Cy’? is zero. But terms as yet uncalculated in (say) 
«° or y* may be positive or negative, and we have to proceed to higher-order 
Taylor approximations in order to determine the nature of the stationary 
point. 


Case2 A=0,C #0 


We can express Ax? + 2Bry + Cy? as 
1 
C 


Provided that B ¥ 0, this is a difference of two squares, so we have a saddle 
point. (Since, for A =0 and B 40, AC — B? < 0, as in Case 1 we can say 
that the condition AC — B? < 0 leads to a saddle point.) 


Aa? + 2Bry + Cy? = ((Ba + Cy = (Bz)”) : 


If B=0 (and so AC — B? = 0), we do not have enough information to 
classify the stationary point at the origin. Along the direction in which 
y = 0, the quadratic expression Ax? + 2Baxy + Cy? is zero, so (as in Case 1) 
we have to proceed to higher-order Taylor approximations if we wish to 
determine the nature of the stationary point. 


Case3 A=C=0 
We can express Ax? + 2Bary + Cy? as 


B 
Ax? + 2Bay + Cy? = 2Bay = 2 ((« + y)? -(z- y)”) : 


Again, provided that B 4 0, we have a difference of two squares and there 
is a saddle point at the origin. (Again, the condition B 4 0 implies that 
AC — B? <0.) 


If B=0 (and so AC — B? = 0), we do not have enough information to 
classify the stationary point at the origin, since the quadratic expression 
Ax? + 2Bary + Cy? is zero, so (as in Cases 1 and 2) we have to proceed to 
higher-order Taylor approximations in order to determine the nature of the 
stationary point. 


All three cases can be generalized to the situation where the stationary point 
is not at the origin. When it is at (a,b), we evaluate the partial derivatives 
at (a,b) and replace Equation (3.1) by the quadratic approximation at (a, b): 


f(x,y) — f(a,b) ~ 5(A(w — a)? + 2B(x — a)(y — 6) +C(y—8)”) . 
Setting p = x —a and q=y — 5, we now have the quadratic expression 

Ap? + 2Bpq + Cq’ (3.2) 
to analyse. 


If this is always positive, we have a local minimum at (a,b). 

If it is always negative, we have a local maximum at (a,b). 

If it is sometimes positive and sometimes negative, we have a saddle point 
at (a,b). 


The conditions on A, B and C for achieving these conclusions are exactly 
the same as in our analysis of the case of a stationary point at (0,0). 
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Test for classifying a stationary point 


Given that the (sufficiently smooth) function f(x,y) has a stationary 
point at (a,b), let 


A= fra(a,b), B= fay(a,b)(= fyo(a,b)), C= fyy(a, 6). 


(a) If AC — B? > 0, there is: 
(i) a local minimum at (a,b) if A > 0; 


(ii) a local maximum at (a, 6) if A <0. 
(b) If AC — B? <0, there is a saddle point at (a,b). 
(c) If AC — B? = 0, the test is unable to classify the stationary point. 


Example 3.3 

Locate and classify the stationary point of the function f(x,y) = ety"), 
Solution 

Partially differentiating gives f, = —2ce~(*°+¥") and i —2ye (2? +9"), 


Since f, = 0 only when x = 0, and fy = 0 only when y = 0, the station- 
ary point is at (0,0). 


Since fra = ~2e-(@? +9?) 4 Ag e (ty?) we have A= fz(0,0) = —2. Also, 
ig= —~2e (2? +9") 4 Ay e (ety?) therefore C' = fy,(0,0) = —2. Finally, 
a= 4rye-(’+¥"), therefore B = ja (00) =0: 


So we see that AC — B? = 4 > 0, and since A = —2 < 0, the stationary 
point is a local maximum. Hi 


*Exercise 3.4 
Locate and classify the stationary point of the function 


f(x,y) = 2x? — xy — 3y? — 3x 4+ Ty. 


*Exercise 3.5 


Locate and classify the four stationary points of the function 


f(a, y) = 2° — 122 — y? + 3y. 


3.3 Classifying stationary points using eigenvalues 


In Subsection 3.2 you saw that the critical factor in classifying a stationary 
point (a,b) of a function of two independent variables is the expression 


Ap* + 2Bpq + Ca’, (3.2) 


where A, B and C are the second derivatives frz(a,b), fry(a,b) and fyy(a, b), 
respectively, and p=x—aandq=y-—b. 


In this subsection you will see how to use eigenvalues to classify a stationary 
point. In order to use eigenvalues, we need to think in terms of matrices. In 
particular, since B = fry(a,b) = fyx(a, 6), it seems that B may be expected 
to occur twice in a relevant matrix expression. Indeed, it turns out that the 
matrix we require is the symmetric matrix 

A ll 


M=|>5 C 


As M is real and symmetric, 
it has real eigenvalues and 
eigenvectors (see Unit 10). 
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Exercise 3.6 
Verify that 


[p q| M ? = Ap* + 2Bpq+ Ca’. 


Assuming that x and y can take any values, so can p and q. So, for certain 
values of x and y, [p |" will be an eigenvector of M, corresponding to a 
real eigenvalue A. Then 


ia i 
[p q]/M H =[p q|] a = \(p* + q’). 


That is, if [p |” is an eigenvector of M with eigenvalue A, then expres- 
sion (3.2) is given by 


Ap’ + 2Bpq + Cq? = Xp? + @”). 


If M has a positive eigenvalue and a negative eigenvalue, then, for certain 
values of x and y, [p |” will be an eigenvector corresponding to the pos- 
itive eigenvalue, and for certain other values of x and y, [p q\? will be 
an eigenvector corresponding to the negative eigenvalue. Therefore, since 
p? + q’ is always positive, the expression Ap? + 2Bpq + Cq? will sometimes 
be positive and sometimes negative, according to the sign of A. Furthermore, 
since a scalar multiple of an eigenvector is still an eigenvector, corresponding 
to the same eigenvalue, we can take p and q as close to zero as we like, and 
hence « and y as close to a and b as we like. Therefore the stationary point 
will be a saddle point. 


If both eigenvalues are positive, it seems reasonable to expect that the ex- 
pression Ap? + 2Bpq + Cq? will always be positive (and similarly for the 
negative case). In fact, this is true, although we shall not prove this state- 
ment. 


Eigenvalue test for classifying a stationary point 
Given that the (sufficiently smooth) function f(x,y) has a stationary 
point at (a,b), let 
A= Fux (a, 0), B = fay(a, b) (= Fya(a, b)), 
A B 
C= fila, 5), M= E als 
and let A; and Ag be the real eigenvalues of M. 
(a) If Ay and Ag are both positive, then there is a local minimum 
at (a,b). 


(b) If Ai and Ag are both negative, then there is a local maximum 
at (a,b). 

(c) If A; and Az are non-zero and opposite in sign, then there is a saddle 
point at (a,b). 


(d) If either, or both, of A; or Ag is zero, then the test is inconclusive. 
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The definition of an 
eigenvector excludes 
[p gq? =[0 of. 


Section 3 Classification of stationary points 


For example, if we apply this test to the function We make use of several ideas 
from Unit 10 here. 
f(x,y) = 2a — xy — 3y* — 3a + Ty —— : 
(discussed in Exercise 3.4), we have a stationary point at (1,1), and A= 4, 


B=-—landC = —6. It follows that M = Ee - 
equation is A? + 2\— 25 =0. The eigenvalues are therefore /26 — 1 and 
—(V26+1). The eigenvalues are non-zero and of opposite sign, so the 
stationary point is a saddle point. 


, and the characteristic 


This test can be extended to functions of three or more variables. If f 
is a sufficiently smooth function of n variables, then a stationary point 
of f is a point (a1,@2,...,@n) at which all the n first partial derivatives 
are zero. There will be n? second partial derivatives, and their values at 
(a1, @2,.-.,@p) can be written in the form of an n x n matrix M, the entry 


in the ith row and jth column being the value F@ig<2c50q)-/ Tis is 


2 
On07; 
called the Hessian matrix of f at the point (a1,...,a,). Our assumption 
that f is sufficiently smooth implies that M will be a symmetric matrix, 
since the Mixed Derivative Theorem extends from the two-variable case to 
the several-variable case. Thus the eigenvalues of M will be real and lead 
to the following classification. 


(a) If all the eigenvalues are positive, then there is a local minimum 
Bb (Giiys + .5tin): 


(b) If all the eigenvalues are negative, then there is a local maximum 
ab: (Gj s+e pn): 


(c) If all the eigenvalues are non-zero but they are not all of the same sign, 

then there is a saddle point at (a1,...,@n). In the case of more than two 
variables, the analogy of a 
saddle is not very helpful in 
visualizing a saddle point. 
Example 3.4 ba remains ae that 
there are sections through a 
saddle point giving a section 
ioe, y;2) = By? + 32? — dry — Qyz — 42a. function with a local 
minimum, and others giving a 
section function with a local 
maximum. 


(d) If one or more of the eigenvalues is zero, then the test is inconclusive. 


Find and classify the stationary point of the function 


Solution 


We find the stationary point by solving the simultaneous equations 
Wy = —4y —4z = 0, 
Wy = by — 4x — 22 = 0, 
w, = 6z—-2y—4¢%7 = 0. 


The only solution is «= y = z = 0. 

The second partial derivatives of w(x, y, z) are constants: 
Wee = 0, Wyy = 6, Wz; = 6, 
Way = Wye = —4,  Wyz = Wry = —2, Weg = Waz = —4. 


Thus the required 3 x 3 Hessian matrix is 


Wer Way Weaz 0 -4 —4 
M = | Wy, Wyy Wyz | = | —4 6 —2 
Wee Wey Wzz —4 —2 6 


To find the eigenvalues of M, we find the values of » that satisfy the 
characteristic equation det(M — AI) = 0. The solutions of the equation 
det(M — XI) = —(A — 8)?(A +4) = 0 are Ay = —4, Ap = 8 and A3 = 8. Our 
test then tells us immediately that there is a saddle point at the origin. Hi 
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Exercise 3.7 
Find and classify the stationary point of the function 


w(x, y, Z) = 327 + By? + Az? _ 2xy = 2yz = 222, 


6 -—2 -2 
given that the characteristic equation of the matrix }|-2 6 —2 
is (8 — A) ((6 — A)? — 12) =0. -L-ol & 


Exercise 3.8 


Classify the stationary point at the origin of the function 
w(x, y, Z) = 27 + Qy? + 22 4+ 2WBz2z, 


given that the characteristic equation for the relevant Hessian matrix has 
4— as a factor. 


Least squares approximation revisited 


In Subsection 3.2 of Unit 9, you were introduced to the technique of finding 
the ‘best’ straight line through a set of data points that is subject to ex- 
perimental error. The example used there consisted of four measurements 
that appeared (within the limits of experimental error) to satisfy a linear 
relationship. We sought an expression of the form y = ag + a,x that best 
described the observed data, which were as shown in Table 3.1. 


We denoted by d; the vertical distance of each point from the ‘best’ straight 
line, so that 


d; = (ao + a42j) — Yi (a = 1,2, 3,4). (3.3) 


We sought the straight line that minimized the sum of the squares of these 
deviations. We then wrote Equations (3.3) in vector form, as 


d= Xa-y, (3.4) 
where 
af it bw a 
d=|d, do ds dj‘, t=, - | 


a=[a9 a)’ and y=(0.9 2.1 2.9 4.1)". 


The sum of the squares of the deviations is d’d, and the transpose of Equa- 
tion (3.4) is 


d? =a' xX? =¥", (3.5) 
so Equations (3.4) and (3.5) yield 
d’d=a' X’ Xa—2a' X y+ yy. (3.6) 


In Unit 9 it was stated that the vector a that minimizes this expression 
satisfies 


(X?X)a=X7y. (3.7) 


It was also stated that the explanation of this would be postponed until 
Unit 12, so the time has come! Let us write the known quantities as 


S T v 
x7x=|F ae xty= |"). yy =2. 


Let us also put ag = x and aj = y. Then, by Equation (3.6), d’d is a 
function of « and y, namely 


d?d = f(z,y) = Sx? + 2T ry + Uy? — 2ux — Qwy + z. (3.8) 
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The functions in Exercises 3.7 
and 3.8 recur in Activity 4.2, 
where you are asked to use 
the computer algebra package 
for the course to locate and 
classify the stationary points. 


Table 3.1 


xii 2 3 4 
y|09 2.1 2.9 41 


Calculating X7X for the data 
above gives S = 4, T = 10 
and U = 30. 


Exercise 3.9 


Assuming that X7X is invertible, show that the only stationary point of the 
function f(x,y) defined in Equation (3.8) is at the point whose coordinates 
satisfy Equation (3.7). 


Therefore, provided that X’X is invertible, the sum of the squares of the 
deviations d’d will have just one stationary point. It turns out that the 
eigenvalues of X7X are always both positive, so it is a local minimum. The 
values of ag and a; that produce this minimum will thus be the parameters 
of the ‘best’ straight line through the data points. 


This gives a justification for Procedure 3.2 (least squares straight line) of 
Unit 9. 


End-of-section Exercises 


Exercise 3.10 
Find and classify the stationary points of the following functions. 
(a) f(z,y)=V1—-2?+y? — (b) T(z, y) =cosx + cosy 


Exercise 3.11 


Find and classify the stationary point of the function f(z, y) = /1— 2? — y?. 


4 Computer activities 


The computer algebra package for the course allows you to locate and classify 
stationary points. 


Use your computer to complete the following activities. 


Activity 4.1 


Locate and classify the stationary points of the following functions of two 
variables. 


(a) f(x,y) = 2a? — wy — 8y? — 32 + Ty 
(b) f(a, y) = 23 — 122 — y> + 3y 
(c) f(z,y)= J/1-22+y? 


Activity 4.2 


Locate and classify the stationary points of the following functions of three 
variables. 


(a) f(x,y, 2) = gt y? — 27 +4 Qaryz + 4z 
(b) f(x,y, z) = 3a? + 3y? + 42? — Qary — Qyz — Qaz 
(c) f(x,y, 2) = 2? + 2y? + 27 + 2V3az 


Section 4 Computer activities 


The matrix X7X is invertible 
provided that the second 
column of X contains at least 
two distinct entries. We do 
not prove this here. 


See Exercise 3.4. 
See Exercise 3.5. 


See Exercise 3.10. 


See Exercise 3.7. 


See Exercise 3.8. 
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Outcomes 


After studying this unit you should be able to: 
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calculate first and second partial derivatives of a function of several 
variables; 


understand the use of a surface to represent a function of two variables; 
construct the equation of the tangent plane at a given point on a surface; 
calculate the Taylor polynomials of degree n for a function of one vari- 
able; 

calculate the first-order and second-order Taylor polynomials for a func- 
tion of two variables; 

locate the stationary points of a function of two (or more) variables by 
solving a system of two (or more) simultaneous equations; 

classify the stationary points of a function of two variables by using the 
AC — B? test; 

classify the stationary points of a function of two (or more) variables by 
examining the signs of the eigenvalues of an appropriate matrix; 

use the computer algebra package for the course to find and analyse 
stationary points of a function of more than one variable. 


Solutions to the exercises 


Section 1 


1.1 (a) f(2,3) =12—54= —42 
(b) (3,2) =27-—16=11 
(c) f(a,b) = 3a? — 268 
(d) f(b,a) = 3b? — 2a 
(e) f(2a,b) = 3(2a)? — 2b? = 12a? — 203 
(f) f(a—,0) = 3(a— 6)? 
(g) f(x,2) = 32° — 16 
(h) f(y,2) = 3y? — 20° 
1.2 The potential energy U of a particle of mass m 
placed at height h (relative to a datum QO) is given by 
U(h) = mgh (where g is the acceleration due to grav- 
ity). Now A is acos 0 below O, and B is acos ¢ below A; 
so B is a(cos 6 + cos ¢) below O. Thus, in this case, we 
have 

U(6, 6) = —mga(cos 6 + cos ¢). 
The least possible value of U occurs when cos@ and 
cos @ take their greatest values, and this happens when 
6= ¢=0. So the least value of U is U(0,0) = —2mga, 
and it occurs when the system is hanging vertically. 


1.3 The section function of F(x,y) = 100e~@*+”) 
with y fixed at 0 is F(x,0) = 100e~*”. It follows that 
d 
—(F(x,0)) = —200xe~*”. 
dx 
This derivative is zero when x = 0, so F(az,0) has a 
stationary point at x = 0. 
Differentiating again with respect to x, we obtain 

d? d, _.2 

= —200(1 — 22?)e-*”. 

This derivative is —200 when x = 0. It follows that the 
section function has a local maximum at x = 0. 


1.4 or = 2x sin(axy) + (x? + y?)y cos(zy), 
ee 24,3 
a 3y° sin(xy) + (a* + y”)acos(xy). 


1.5 ug = cos6 + psec? 6 and ug = tan d. 


1.6 (a) Treating y and ¢ as constants, and differenti- 
ating with respect to a, 

fo = 2y3t*x + 2y + 8t7 x. 
Treating x and t as constants, and differentiating with 
respect to y, 

fy = 3a7tty? +2041. 


Solutions to the exercises 


Treating x and y as constants, and differentiating with 
respect to f, 
fe =4e7 yf + 8a7t. 


Oz Oz 5 
(b) We have a 2(1+ 2) and dy 3(14+y)°. 


Sketches of the section functions z(#,0) = (1+ 2)?4+1 
and 2(0,y) =1+(1+ y)? are shown in the following 
figure. 


ev 


We obtain (0z/0x)(0,0) = 2, which represents the slope 
of the left-hand graph above where x = 0). Also, we ob- 
tain (0z/Oy)(0,0) = 3, which represents the slope of the 
right-hand graph above where y = 0. 


1.7 We obtain a = 2(1+ 2) and a8 = 3(1+ y)?. So 
Ox Oy 
we have 


Uaey yey a (@,2) 97. 
Oy 
So 


bz ~ 2dx + 27dy. 


1.8 We have ee 
Ox 
dy 


dt = 2. Thus 


dz Ozdz | Ozdy 
dt Oxdt ' Oy dt 
= 2tcosx+6siny 
= 2tcos(t?) + 6sin(2t). 


) d. 
= COs 2, = 3siny, — = 2t and 


Oy 


1.9 (a) We have 
Vi(x,y) = (Ary + 3y?)i + (22? + 9ay?)j. 


(b) Vf(2,1) = 11i + 26j, so the slope at (2,1,14) in 
the direction 3j + 4j is 
(11i + 26j) - (21+ 4j) = &. 
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1.10 We need to evaluate the partial derivatives at 
x =2and y=1. Since 0z/dx = x and 0z/dy = 2V3y, 
Equation (1.12) becomes dz/dt = 2cosa + 2V3sina. 
We see that the slope dz/dt is a function of a only, 
but to make the argument clear we shall replace the 
slope dz/dt by s(a), so that s(a) = 2cosa + 2V3sina. 
The exercise requires us to find the maximum value of 
s(a) for varying values of a. 


The stationary points of s(@) occur when ds/da = 
—2sina +2V3cosa = 0, which gives tana = /3. This 
equation has two solutions in the interval 0 < @ < 27, at 


a= anda= an Using the Second Derivative Test, 
we see that a = 3 corresponds to a maximum value 


of s(a) (while a = 1 gives a minimum value). It fol- 


lows that the greatest slope at the point (2,1,2+ V3) 
is 


2cos ¥ + 2V3sin F = 1+ (2V3)%B = 4. 

The direction of the greatest slope of z= f(x,y) at 
(2,1,2+ V3) is at an angle of 3 measured anticlock- 
wise from the positive x-axis. 

Now V/(2,1) = 2i+ 2V3j, so the angle 6 between 
Vf(2,1) and the positive x-axis (measured anticlock- 
wise) is also given by tan@ = V3. So, since both com- 
ponents of V f(2,1) are positive, 0 = 3. Thus the di- 
rection of greatest slope is the same as the direction of 


Vf (2,1). 


1.11 We have a = ycosa, a = sing, Wa e’ and 
Ox Oy dt 
ay _ of so 
dt 
dz  Ozdz | Ozdy 


dt Oxdt Oy dt 
= (ycosz)e’ + (sin x)2t 
= t? cos(e’)e’ + sin(e")2¢ 
= t (te cos(e") + 2sin(e’)) . 
Thus z’(0) = 0. 


1,12 (a) We have f, = 3(x + 2y)? — 4(2x% — y) and 
fy = 6(a + 2y)? + 2(22 — y), so 
Vif (2,y) = (3(a + 2y)? —4(22 — y))i 
+ (6(a + 2y)? + 2(2x — y))j. 


(b) The vector i+j has length /2, so d = aii +j) 


is a unit vector in the required direction. At the point 
(1,0, —3), we have 


Vf -d = (—5i+ 10j)- (i+) = &. 


178 


Section 2 


2.1 p,(0) =0 and sin0 = 0, thus p;(0) = sin0. 

pi (x) = 1, so p{(0) = 1, and 

sin’ ¢ = cos, so sin’ 0 = 1, thus (0) = sin’ 0. 

pi (x) =0, so p”(0) = 0, and 

sin” x = —sinz, so sin’ 0 = 0, thus p/(0) = sin” 0. 
p2(0) = 1—0=1 and cos0 = 1, thus p2(0) = cos0. 
p(x) = —x, so p,(0) = 0, and 


cos’ x = —sin x, so cos’ 0 = 0, thus p)(0) = cos’ 0. 
py (x) = =; sO p3(0) = 1, and 
cos” « = — cos, so cos” 0 = —1, thus p4(0) = cos” 0. 


py'(x) = 0, so p3"(0) = 0, and 
cos" x = sinz, so cos” 0 = 0, thus p4’(0) = cos” 0. 


2.2 From Example 2.1, we have 
po(x) = 1, pi(z) = 1+ 22 and po(x) = 1+ 2x + 227. 
Thus (0.1) = 1, pi(0.1) = 1.2 and po(0.1) = 1.22. 
Also, f(0.1) = e%? = 1.22140, to 5 d.p.. 
Thus 

po(0.1) = f(0.1) to the nearest integer, 

p1(0.1) = f(0.1) to 1 d.p., 


2.3 f, = sin(xy) + xrycos(ry) and fy = x? cos(ry), so 
fro = ycos(ay) + ycos(xy) — xy? sin(ay) 
= 2ycos(ay) — xy” sin(xy), 
fyy = —x° sin(ay), 
fey = 2x cos(xy) — x? ysin(ay), 
fue = xcos(xy) + xcos(xy) — x? ysin(zy) 
= 2x cos(xy) — x? ysin(ay). 


2:4 f= 2e°*7 and f, = 3e°***2, 80 
20-43 2043 
Tanz = 4de x+ a. Tay = fyx = 6e w+ Ln 


_ 22+3y 
fyy = Ye : 


Then 
fr(0,0) =2, fy(0,0) =3, 
Fux (0, 0) =4, Fuyy(0, 0) =9, 


2.5 f, = 3e°*—4 and fy = —e**~¥, so 
f(0,0)=1, f2(0,0)=3, fy(0,0) = —-1. 
Thus the tangent approximation near (0,0) is 
pi(a,y) = f(0,0) + fr(0,0)a + fy(0,0)y 
=14+32-y. 


2.6 (a) dg = 84 2Ar+ By, q = 7+ But 2Cy, 
dra = 2A, dry = B and qyy = 2C. 


(b) q(0, 0) = Qa, qa (0, 0) — B, dy (0, 0) io Ys 
dua (0,0) = 2A, gdary(0,0) = B and qy,(0,0) = 2C. 


2.7 f(0,0), fr(0,0) and f,(0,0) are constants, so the 
equation is that of a plane. Since g(0,0) = f(0,0), the 
plane goes through (0,0, f(0,0)). 
Now gz(x,y) = f2(0,0) and g,(x,y) = fy(0,0), so 
Gx(0,0) = fr(0,0) and gy(0,0) = fy(0,0). 
Thus, at (0,0), the plane has gradient 
V9(0, 0) = Gx(0, 0)i oi dy (0, 0)j 
= fx (0, 0)i Wr fy (0, 0)j 
= Vf(0,0), 
which is the gradient of z = f(x,y) at (0,0). 


2.8 We have f(0,0) = 1, 


fr(0,0) =2,  fy(0,0) = 3, 
Feat, 0) = 4, Fyy (0,0) = 9, 
fey(0,0) = 6. 


Substituting these values into Equation (2.7), we obtain 
$(4a7 + 12xy + 9y) 


p2(z,y) =1+ 22+ 3y+ 


= 14224 3y + 22? + 6ry 4 Sy. 
2.9 We have U(0,0) = —2mga and 
eh = mgasin 0 ae mga sin db 
BY = ’ Od = ’ 
so 
OU OU 
99 (0: 0) = 35; ,0) =0. 
We also see that 
ais = mgacos 6 aad = mgacos 
i eT , 
so 
0?U ou 0?U 
ae = ab ar 0) = mga, A000 


It follows from Equation (2.7) that the second-order 
Taylor polynomial about (0,0) is 


p2(0,) = —2mga + 4mga(6? + ¢”). 


2.10 f, = ye™Y +2(a4+ y) and fy = re + 2(x+ y), 
so 
tex = yery +2, 


fuy = ze" +2. 


fey =e" + vye”’ + 2, 


Therefore 
f(0,0) =1 
fr(0,0) = fy(0,0) = 0, 
fex(0,0)=2, fey(0,0) =3,  fyy(0,0) = 2. 


Thus, from Equation (2.7), the second-order Taylor 
polynomial about (0,0) is 


po(x,y) = 1+ $(227 + 2(3axy) + 2y”) 
=14+a7+3zryt+y’. 


Solutions to the exercises 


2.11 f(x,y) = 3(x? + 2y? — 3xy)? (2x — oo and 
fy(a,y) = 3(a? + 2y? — 32y)*(4y — 32), s 
few (x,y) = 6(x? +2y* — Bary) (2a - sy 
+ 6(x* + 2y? —3ay)", 
foy(x,y) = 6(a? + 2y? — Say) (2x — 3y)(4y — 3a) 
— 9(z? + 2y? — 32y)?, 
fuy(a, y) = 6(x? + 2y” — 8xy)(4y — 3x)? 
+ 12(2? + 2y? — 3zy)?. 
If«=1 and y=-—1, then 


fen (1,1) = 6(1+ 2+3)(24 3)? + 6(1+2+43)? 
= 1116, 
foy;—1) = 601 + 2 +3)(—4—3)? + 120 + 24-3) 
= 2196. 
Similarly, fry(1,—1) = —1584. 


2.12 (a) fx =3(2+2+2y)? and fy = 6(2+24 2y)’, 
so 
fer = 6(2 + 4+ 2y), 
fuy = 24(2 + x + 2y). 
It follows that 


ie = 12(2+ 2 + 2y), 


f(0, 0) = 8, 

fr(0,0) = 12, fy(0,0) = 24, 

fex(0,0)=12, fry(0,0) = 24,  fyy(0,0) = 48. 
Therefore 

pi(z,y) = f(0,0) + fc(0,0)x + fy(0,0)y 


= 8+ 124% + 24y, 
po(v,y) = (0,0) + fr(0,0)a + fy(0,0)y 
+ $ (fex(0,0)a* + 2fey(0, O)ay 
+ Fyy(0, 0)y") 
= 84122 + 24y + 6a? + Way + 24y?. 
Expanding the bracket in the expression for f, we ob- 
tain 
f(x,y) = 8+ 122 + 24y + 6a? + 24ry + 24y’ 
+ 6x7y + 12ry? + 8y°. 


Notice that the function f(a, y) is itself a polynomial of 
degree three and that the Taylor polynomial of second 
degree about (0,0) consists of the terms of f of degree 
less than three. 


(b) f(1,-1) =1, f2(1,-1) =3, fy, -1) =6, 
ial =) = 6, F414, ~i\c = 12 and fad -l)= 
It follows that 


pi(a,y) = f(1,—-1) + fe(1, -1)(# — 1) 
+ fy, Dy +1) 
=143(@-1)+6(y+1), 
po(a,y) = fl, —-1) + fel, -1)(@# - 1) 


+ fy(1,—-1)(y + 1) 
$ (fee (1,—-1) (x i iy 
+ 2fey(1,—-1)(@ — 1)(y+ 1) 
ar fw;-Dy+ iF) 
=14+3(@-1)+6(y+1)+3(x—-1)? 
+ 12(a@ — 1)(y +1) + 12(y + 1)?. 
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(c) Fx =3(1+X+42Y)? and Fy =6(1+ X 4 2Y)?, 
so 
Fyx =6(14+X+42Y), 
Fyy = 24(14+ X + 2Y). 
It follows that 


Fxy = 12(1+ X 4+ 2Y), 


F(0,0) = 1, 

Fx (0,0) =3, Fy(0,0) =6, 

Fxx(0,0)=6, Fxy(0,0) =12, Fyy(0,0) = 24. 
Therefore 


p(X, Y) =14+3X 4+ 6Y, 

po(X,Y)=14+3X+6Y +3X?+12XY + 12Y". 
(By substituting X¥ = «—1 and Y = y +1, these poly- 
nomials are the same as those you obtained in part (b). 
Finding the Taylor polynomials for f near (1,—1) is 
equivalent to making a suitable change of variables 
and then calculating the Taylor polynomials near (0,0). 
This often leads to simpler arithmetic, because evalua- 
tions of the partial derivatives at (0,0) are often partic- 
ularly easy. You can then change variables back again 


to obtain the polynomial in terms of the original vari- 
ables.) 


Section 3 


3.1 fr (x,y) = 6x — 4y +4 and 
fy(x,y) = —4a + 4y — 8, so there is one stationary 
point, at the solution of the simultaneous equations 
6x — 4y = —4, 
{ —4ra+4y= 8, 
which is x = 2, y= 4. Thus the only stationary point 
is at (2,4). 


3.2 f, =2ryt+ y? — 3y = y(2x + y — 3) and 
fy = 2xy+ 2? — 3x = a(2y + x — 3), so, to find the sta- 
tionary points, we have to solve the simultaneous equa- 
tions 
y(2x + y — 3) =0, (S.1) 
x(2y + 2 —3)=0. (S.2) 
From (S.1), we have two cases. 
If y = 0, then (S.2) becomes x(a — 3) = 0; thus x = 0 
or « = 3, so we have found the two solutions (0,0) and 
(3,0). 
If 2a +y—-3=0, ie. y = 3-22, then (S.2) becomes 
x(3 — 3a) = 0, and we have x = 0 or x = 1. Substitut- 
ing these values for x into y = 3 — 2x gives y = 3 when 
xz =0 and y=1 when x = 1. Thus we have two more 
solutions, (0,3) and (1,1). 
So we have four stationary points: (0,0), (3,0), (0,3) 
and (1,1). 
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3.3 We have Ug = mgasin@ and Uy = mgasing, so 
Uo(0,0) = Ug (0,0) = 0, which shows that (0,0) is a sta- 
tionary point. Also, U(0,0) = —2mga and, near (0,0), 
0 < cos@ < 1 and 0 < cos¢ < 1, so U(0,¢) > —2mga, 
and it follows that this point is a local minimum. 


3.4 f, =4x—y—3 and f, = —x — 6y+ 7, so to find 
the stationary point, we need to solve the simultaneous 
equations 
4g —- y—3=0, 

{“ —6y+7=0. 
The solution is x = 1, y=1, so the stationary point 
is at (1,1). Also, fer(1,1) =4, fey(1,1) = —1 and 
fyy(1, 1) = —6, so AC — B? = —25; therefore the sta- 
tionary point is a saddle point. 


3.5 f, = 3x? — 12 and fy = —3y? + 3, so the station- 
ary points are at the points (x,y) where 2? = 4 and 
y? = 1, namely (2,1), (2,—1), (—2,1), (—2,-1). Also, 
fer = 62, fry =0 and fy, = —6y. Thus, at (2,1) and 
(—2,—-1), we have AC — B? = —36xy = —72, and these 
are saddle points. At (2,—1) and (—2,1), we have 
AC — B? = —36ry = 72. Since A > 0 at (2, —1), this is 
a local minimum; since A < 0 at (—2,1), this is a local 
maximum. 


zemlil=[a olla] [zpecn) 


= p(Ap + Bq) + q(Bp + Ca) 
= Ap* + 2Bpq + Ca’. 


3.7 First we find the stationary point by solving the 
simultaneous equations 

We, = 62 — 2y—2z=0, 

Wy = by — 22 — 2z = 0, 

w, = 8z —2y—22 =0. 
The only solution is = y = z = 0, so the stationary 
point is at (0,0,0). 


Now Wee = 6, Wyy = 6, Wez = 8, Wary = —2, Wez = —2 
and wy, = —2, so the Hessian matrix is 

Wee Way Wez 6 -—2 -2 

Wyr Wyy Wyz | = | —2 6 —2 

Wzar Wzy Wzz —2 —2 8 


The given characteristic equation of this matrix is 
0 = (8 — A) ((6 — A)? — 12) 
= (8 — \)(6 — 1+ 2V3)(6 — A— 2V3), 
so the eigenvalues are 8, 6 — 23 and 6+ 2/3. These 


are all positive, thus there is a local minimum at the 
origin. 


3.8 We have 
Wy = 20+ 2V/3z, Wy =4y, Wz =2z+ 2/32, 
Wer = 2, Wy =4, Wz = 2, 
Way = 0, Waz = 2V3, Wyz = 0. 
So the Hessian matrix is 
2 0 2v3 
0 4 #O 
2/3 0 2 


The characteristic equation is (4— )(A? — 44 — 8) = 0, 
giving the eigenvalues 4, 2 — 2\/3 and 2+ 2/3. Two of 
these are positive and one is negative; thus the station- 
ary point is a saddle point. 


3.9 Partially differentiating, f, = 2S a+ 2Ty—2v and 
fy = 2Tx + 2Uy — 2w. Thus, at any stationary point, 
2Sa+2Ty = 2v and 27x + Uy = 2u, ie. Sa+Ty =v 
and Tx + Uy = w, or, as a matrix equation, 


EF c| Hl i. ee (S.3) 


. Sf _ T rc} a (ee T 
Since E r| =X’°X, i =a and |. =X'y, 


(S.3) is identical to Equation (3.7). Provided that X7X 
is invertible, (S.3) has a unique solution, so there is only 
one stationary point. 


3.10 (a) Partially differentiating, 
fe= a — a? ty)”, 
fy=y-— 2 + yy? 
so the only stationary point is at (0,0). We also have 
fon = —(1 — 2? + y2)“¥? — 22(1 — a? 4 y?)3/2, 
so A= fr2(0,0) = —1. Since 
fey = ry — e+ er; 
we have B = f,,(0,0) = 0. Also, 
foy = (1-2? $y?) M2 — yd — a? $y?) -9/?, 
so. C = f,,(0,0) = 1. 
So AC — B? = —1 <0, and there is a saddle point at 
the origin. 


(b) T, = —sinz and T, = —siny, so the stationary 
points occur when sinz = 0 and siny = 0, ie. at the 
points (nt,m7) where n and m are integers. We also 


see that T,, = —cosz, Try = 0 and Ty, = — cosy, so 
at the stationary point (nz,m7) we have A = — cos nz, 
B=0O0and C = —cosmzm. There are three cases to con- 
sider. 

If m and n are both even, then AC — B? = 1>0 and 
A =-1< 0, so there is a local maximum at (nz,mr). 


If m and n are both odd, then AC — B? = 1>0 and 
A=1>0, so there is a local minimum at (nz, mz). 


Otherwise, AC — B? = —1 <0 and there is a saddle 
point at (nz, mr). 


Solutions to the exercises 


3.11 Partially differentiating, 
fo = —a(1 — 2? — y?)2/?, 
fy=—y-—2? - yy, 
so the only stationary point is at the origin. We have 
fom = —(1 — 2? — y?2)-¥/? — 22(1 — ? — 9?) 3/2, 
fry = =gll= a? — eye, 
fyy = -(- 2? - y2)—V? — y2(1 — a? — y?)-3/, 
so A= fre(0,0) = -1, B= fry(0,0) =0 and C= 
fyy(0,0) = —1. 
Since AC — B? = 1>0 and A = —1 <0, there is a lo- 
cal maximum at (0,0). 
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absolutely ill-conditioned 42 
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column vector 6, 18 
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Gaussian elimination 8, 12 

with essential row interchanges 
general solution 108, 121, 130 
gradient 157 
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leading diagonal 11 
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linear equation 6 
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linear regression 37 
linear transformation 25 
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local maximum 147, 167 
local minimum 147, 167 
lower triangular matrix 22 
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Maclaurin series 159 
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matrix 6 

addition 19 
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inverse 22, 24 
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15 
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non-invertible 22, 24, 27, 70 
non-singular 22 


order of 18 

power of 21 

product 20 

singular 22 

size of 18 
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sum 19 


symmetric 21, 69 
transpose 21 
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scalar 19 
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order of a matrix 18 
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parametrized curve 155 

partial derivative 151 
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particular solution 108 

pivot 12 

plane, equation of 144 

point of inflection 147 

polynomial interpolation 34, 36 
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row vector 18 
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saddle point 168 

scalar multiplication of a matrix 19 
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scaling a vector 83 

Second Derivative Test 147 

second partial derivative 161 
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second-order partial derivative 161 
second-order Taylor approximation 164 
second-order Taylor polynomial 164 
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simple harmonic motion 131 
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slope of a surface 157 
square matrix 18, 21, 22, 27 
stationary point 147, 166 

classification of 147, 171, 172 
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symmetric matrix 21, 69 
system of linear equations 6 


tangent approximation 163, 164 
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tangent plane 149, 164 
Taylor polynomial 

first-order 164 

nth-order 159 
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second-order 164 
Taylor’s Theorem 160, 169 
trace of a matrix 65 
transformation of the plane 25 
transition matrix 56 
transpose of a matrix 21 
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triangular matrix 22, 68 


upper triangular form 8 
upper triangular matrix 11, 22, 31 
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norm of 39 
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well-conditioned 39, 42 
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